Neighborhood-Level Particle Pollution Assessment during the COVID-19 Pandemic via a Novel IoT Solution

: In recent years, the concentration levels of various air pollutants have been constantly increasing, primarily due to the high vehicle ﬂow. In 2020, however, severe lockdowns in Greece were imposed to reduce the spread of the COVID-19 pandemic, which led to a rapid reduction in the concentration levels of air pollutants such as PM 2.5 and PM 10 in the atmosphere. Initially, this paper seeks to identify the correlation between the concentration levels of PM 10 and the trafﬁc ﬂow by acquiring data from low-cost IoT devices which were placed in Thessaloniki, Greece, from March to August 2020. The correlation and the linearity between the two parameters were further investigated by applying descriptive analytics, regression techniques, Pearson correlation, and independent T -testing. The obtained results indicate that the concentration levels of PM 10 are strongly correlated to the vehicle ﬂow. Therefore, the results hint that the decrease in the vehicle ﬂow could result in improving the quality of environmental air. Finally, the acquired results point out that the temperature and humidity are weakly correlated with the concentration levels of PM 10 present in the atmosphere.


Introduction
Nowadays, the quality of environmental air is constantly deteriorating in most urban areas, resulting in poor quality of life.Several studies have indicated that a wide range of health problems are caused by the high concentration of air pollutants such as CO, NO, and O 3 [1].The studies presented in [2][3][4][5][6][7][8][9] suggest that the high concentration of air pollutants in the urban air has severely impacted overall health, particularly of the cardiorespiratory system.The study conducted by [2] indicated that air pollution caused by vehicle traffic is the primary cause of death in most urban cities.In addition, the study by Han et al. [10] reviewed the negative impact of air pollution on the population of developing countries.Vehicle traffic is the primary factor behind the high concentration level of air particles in the atmosphere [10].In 2020, lockdowns were enforced due to the outbreak of the COVID-19 pandemic.The studies presented in [11][12][13][14][15][16] suggest that vehicle traffic decreased by approximately 50% in several cities in Europe, the U.S., Asia, and South America during the first waves of the imposed lockdowns.Consequently, the concentration of several air pollutants in the atmosphere, e.g., CO, O 3 , and particulate matter mass (PM), decreased by 30% to 70%.
In recent years, several studies have emerged that aspire to calculate the correlation between the concentration of air pollutants and vehicle flow in several European cities. Rojas et al. [17] conducted a preliminary statistical analysis on the concentration of NO 2 and the traffic flow by employing the analysis of variance (ANOVA), which is used to analyze the significant difference between the means of groups.Rojas et al. [17] acquired the measurements before and after the first imposed lockdown and compared the obtained measurements via ANOVA to determine whether the mean value was the same across the groups.Moreover, Sifakis et al. [18] employed several statistical analysis techniques to investigate the changes during the enforced lockdown of 2020 and to calculate the correlation between the concentration of several air pollutants and vehicle flow.The correlation between the two parameters was calculated via several statistical analysis techniques such as ANOVA, the Pearson Correlation Coefficient, and Fischer's Least Significant Difference.The authors in [19,20] aspired to calculate the correlation between the concentrations of PM 10 and PM 2.5 by statistically analyzing the data and employing statistical analysis techniques.Rossi et al. [21] investigated the relationship between the concentration of several air pollutants, e.g., NO, NO 2 , NOx, PM 10 , and the traffic flow.The relationship between the two parameters was computed via Spearman's correlation test.
The majority of air quality monitoring applications rely on data collected by sensors, and tests have been conducted to compare the effectiveness of low-cost PM and gas sensors to that of reference equipment.In a recent study, Cichowicz et al. [22] used commercial sensors positioned throughout the area under investigation to collect data over three years (2019,2020,2021).Liu et al. [23] collected data over a period of 13 months, using lowcost air quality devices with two inexpensive sensors, each of which was exposed to a variety of pollution sources and concentrations, relative humidity, and temperature at four locations in Australia and China.The daily relative errors of PM 2.5 and CO concentrations between the monitors and the reference instruments were analyzed in order to determine the long-term stability of the KOALA monitors used in [23].Kim et al. [24] created an IoT-Based Particulate Matter Monitoring System for construction sites, and the authors emphasize that adopting an IP65-grade PM sensor and creating a dustproof and waterproof outlet improved the system's outdoor durability.It is important to note that the impacts of different ambient conditions on low-cost sensor performance are various, including temperature and relative humidity according to [25,26].
The obtained data often contain noise, outliers, missing or inconsistent values, and therefore several preprocessing techniques should be applied to obtain a clean dataset.Most real-world applications aim to remove extreme values which deviate from the given dataset by employing normalization or standardization techniques [20,21].The noise present in a dataset can be removed by employing data smoothing techniques such as the simple moving average algorithm [21,27].
In this paper, several low-cost IoT devices with embedded measurement devices were placed in buildings in various neighborhoods in Thessaloniki, Greece.Measurements concerning PM 10 and PM 2.5 were obtained before, during, and after the first imposed lockdown of 2020.The municipality of Thessaloniki provided information regarding the traffic flow of the city.The obtained measurements were, initially, preprocessed by employing outlier and smoothing techniques, and the preprocessed data hint that the levels of PM 10 , and PM 2.5 decreased during the first wave of imposed lockdowns.Therefore, the air quality, considering the PM, improved in the neighborhoods of Thessaloniki, which is the second largest city in Greece.
This paper, additionally, aims to investigate the correlation and linearity between the concentration levels of PM 10 , traffic flow, temperature, and humidity.Two null hypotheses were employed to determine the desired correlation.One was that the traffic flow does not influence the concentration level of the air pollutants under investigation, while the other was that the temperature and humidity of the station's location do not impact the levels of air pollutants.The null hypothesis testing was performed via quantitative research-based experiments.
The remainder of this paper is organized as follows: Section 2 elaborates on the measurement devices and the selection of the stations which were employed to acquire the measurements.Section 3 depicts the methodology and techniques which were applied to preprocess the obtained measurements.Section 4 presents the results of the calculated correlation and linearity between the concentration levels of the PM 10 , traffic flow, temperature, and humidity.Finally, Section 5 draws the final conclusions and future directions.

Data Acquisition
This section elaborates on the devices and the scheme which were employed to acquire the concentration levels of PM 10 and PM 2.5 .Initially, the Sympnia IoT Platform was introduced; this is a low-cost Internet of Things (IoT) device which measures the level of air pollution in the atmosphere.Several Sympnia IoT Platforms were placed in Thessaloniki, Greece.This study, however, relied on the measurements acquired from four stations that were placed in the city, so this section elaborates on which criteria had to be fulfilled for a station to be selected.

The Sympnia IoT Platform
The concentration levels of PM 10 and PM 2.5 were acquired from the Sympnia IoT Platform, a low-cost IoT platform that measures the level of air pollution on a neighborhood level.Most environmental monitoring systems measure the concentration levels of air pollutants on a city level.The Sympnia IoT Platform relies on the PrismaSense system, introduced by Prisma Electronics to measure the concentration levels of six air pollutants: NO 2 , NO, O 3 , CO, PM 2.5 , and PM 10 .In [28], the authors describe the smart collector part (hardware and embedded software) of the platform.There is also information about the hardware system testing, including accuracy, stability, data integrity and sensitivity testing [28].
The wireless transmission of the Sympnia IoT Component relies on the Sympnia Smart Collector.The Smart Collector consists of a microcontroller and several interfaces to connect to the sensors.The Smart Collector preprocesses the obtained measurements to calculate several parameters, and the sampling rate ranges from 1 s to 30 min.Two distinct versions of the Sympnia Smart Collector have been developed.The two Smart Collectors differ in the microcontroller used to acquire the concentration levels of air pollutants in the atmosphere and enable wireless communication with the remote server.
The processing module of the Smart Collector relies on an EPS32 microcontroller manufactured by Espressif.The processing module depends on an ultra-low-power mixed signal microcontroller with a dual-core 32-bit RISC CPU.The CPU runs from 160 to 240 MHz.Table 1 describes the main features of the EPS32 microcontroller of the second version and has been presented in detail in [28].
The measurements of PM 10 and PM 2.5 were acquired from low-cost sensors via a software-embedded device referred to as the Sympnia Smart Collector or Smart Collector.The Smart Collectors are easily programmed via an in-house environment or a third-party IDE.Advanced calibration circuity is embedded into the Smart Collectors to remove any outliers obtained from extreme weather conditions.The Smart Collector is compatible with 5.5 and 3.3 sensors that employ UART, I2C, SPI, and GPIO protocols.Five distinct sensor platforms were implemented, and they are summarized in Table 2 [28].The calibration of the sensors is the most important factor for selecting a low-cost reliable sensor.In other words, low-cost air quality sensors were selected based on the calibration sheets provided by their manufacturer, the detailed description of the acquisition and transmission of the data, and the impact of the sensors provided by the respective literature.The measurements acquired by the Sympnia IoT platform could be normalized if selected for the temperature and humidity levels of the microcontroller's environment.Four sensors were embedded in the Smart Collector, which could be adjusted to work simultaneously.
The embedded software of the Sympnia IoT Component has been developed for wireless transmission, acquiring and visualizing the concentration levels of air pollutants.The Sympnia IoT Platform comprises a user-friendly platform that allows its users to access the current and forecast levels of PM 10 and PM 2.5 .The measurements are obtained from over 10,000 air monitoring stations, and they are displayed in the form of an Air Quality Index to be comprehended by everyone, and are accompanied by a color characterization that defines the levels of the current air quality.The Sympnia IoT device provides additional information obtained from every station.By employing the Sympnia IoT platform, the user can visualize the concentration levels of the air pollutants under investigation and download the necessary data for future use [28,29].

Selecting the Data Stations
Hundreds of Sympnia IoT platform devices were placed in Thessaloniki, Greece, and as a result a dense network of devices was created in the city.However, the measurements for the following analysis were obtained from four stations.The stations were selected based on three factors: the spatial distribution, the socioeconomic status of the location, and the building's proximity to the road.
Three additional aspects were considered in selecting which sensors should be activated.These aspects coincide with the parameters indicated by the municipality of the Greek city, the efficiency of the sensor, and the calibration sheets provided by the manufacturer.The former was applied to every area in the city.
Figure 1 illustrates the four stations which were selected to obtain the concentration levels of the PM 10 and PM 2.5 based on the aforementioned factors.Figure 1 shows the locations against the background of Europe, Greece, and only Thessaloniki.Three stations were selected in different neighborhoods across the city (pins 1, 3, 4 in Figure 1), whereas the fourth was distributed in an industrial area (pin 2 in Figure 1).The stations only acquired measurements of air pollutants PM 2.5 and PM 10 .
The statistical analysis was conducted by considering PM 10 the dependent variable, while the city's traffic flow, temperature, and humidity levels were considered the independent parameters.The traffic flow data were provided by the municipality of Thessaloniki.

Data Analysis
This section presents the methodology followed to analyze the measurements obtained from the Sympnia IoT Platform.Figure 2 illustrates this scheme.The obtained measurements were preprocessed depending on the quality of the data.The inspection of the quality of the obtained measurements is often referred to as the initial quality control.The initial quality control relied on employing several techniques to remove outliers, gapfill, and normalize the acquired data.Then, the preprocessed measurements were visualized.Descriptive analytics were additionally applied to portray the main statistics, e.g., the mean, standard deviation, and minimum and maximum value of the acquired concentration levels of air pollutants.The primary statistical values were applied to provide a better understanding of the quality of the accumulated measurements.
Thereupon, according to Figure 2, the impact of other factors on the concentration levels of the air pollutants was estimated by applying inference analytics.The acquired measurements were processed by considering the PM10 as the dependent variable and the temperature, traffic flow, and humidity levels as the independent.The linear relationship

Data Analysis
This section presents the methodology followed to analyze the measurements obtained from the Sympnia IoT Platform.Figure 2 illustrates this scheme.The obtained measurements were preprocessed depending on the quality of the data.The inspection of the quality of the obtained measurements is often referred to as the initial quality control.The initial quality control relied on employing several techniques to remove outliers, gap-fill, and normalize the acquired data.Then, the preprocessed measurements were visualized.Descriptive analytics were additionally applied to portray the main statistics, e.g., the mean, standard deviation, and minimum and maximum value of the acquired concentration levels of air pollutants.The primary statistical values were applied to provide a better understanding of the quality of the accumulated measurements.
Thereupon, according to Figure 2, the impact of other factors on the concentration levels of the air pollutants was estimated by applying inference analytics.The acquired measurements were processed by considering the PM 10 as the dependent variable and the temperature, traffic flow, and humidity levels as the independent.The linear relationship between the dependent and independent variables was calculated by applying linear regression, and ANOVA analysis followed.Inference analytics included other techniques, such as Independent Sample T-test and Pearson Correlation.between the dependent and independent variables was calculated by applying linear regression, and ANOVA analysis followed.Inference analytics included other techniques, such as Independent Sample T-test and Pearson Correlation.

Data Cleansing
Data cleansing is significant due to the presence of outliers and noise in the acquired data.The removal of the outliers within the dataset relies on the Median Absolute Deviation (MAD).The outliers were removed by applying the following equation: where the  denotes the mean value and  controls how intense the removal is.Additionally, the data were smoothed by applying a moving average window of length  , which is mathematically defined as: The  corresponds to the ℎ element in the moving window, and the length of the moving window () was set equal to 15.

Independent Sample T-test
An Independent Sample T-test is a statistical test used to determine whether there is a significant difference between the means of two groups, and it relies on the alpha level.The alpha level suggests the level of the threshold of rejecting the null hypothesis.The alpha level sets the probability that a Type 1 error can occur.A Type 1 error involves rejecting the null hypothesis and accepting the alternative one.The p-value is the probability of obtaining results that are at least as extreme as the observed results under the assumption that the null hypothesis is correct.The p-value is linked to the alpha level.If the value of the p-value is less than the alpha level, then there is a statistically significant relationship between the dependent and independent variables.As the value of p-value decreases, the

Data Cleansing
Data cleansing is significant due to the presence of outliers and noise in the acquired data.The removal of the outliers within the dataset relies on the Median Absolute Deviation (MAD).The outliers were removed by applying the following equation: where the µ denotes the mean value and k controls how intense the removal is.Additionally, the data were smoothed by applying a moving average window of length L, which is mathematically defined as: The a i corresponds to the ith element in the moving window, and the length of the moving window (L) was set equal to 15.

Independent Sample T-test
An Independent Sample T-test is a statistical test used to determine whether there is a significant difference between the means of two groups, and it relies on the alpha level.The alpha level suggests the level of the threshold of rejecting the null hypothesis.The alpha level sets the probability that a Type 1 error can occur.A Type 1 error involves rejecting the null hypothesis and accepting the alternative one.The p-value is the probability of obtaining results that are at least as extreme as the observed results under the assumption that the null hypothesis is correct.The p-value is linked to the alpha level.If the value of the p-value is less than the alpha level, then there is a statistically significant relationship between the dependent and independent variables.As the value of p-value decreases, the occurrence of a false positive occurring becomes less likely, and therefore the null hypothesis can be rejected [30][31][32].

Pearson Correlation Coefficients
The Pearson correlation coefficient measures the linear dependence between two variables, and it is defined as the covariance of the two variables divided by the product of their respective standard deviations.The product of the standard deviations of the two results is used as the normalization factor.Mathematically, the Pearson correlation coefficient is defined as: where n, x i , y i , x = 1 n ∑ n i=1 x i and y = 1 n ∑ n i=1 y i denote the number of acquired measurements, the ith element of the variables X and Y, and the mean value of them.The value of the coefficient r xy ranges from −1 to +1.The Pearson correlation coefficient indicates the strength of the linear relationship between the two variables X and Y.If the two variables X and Y are positively correlated, then the Pearson correlation coefficient is greater than zero.However, if the value of the Pearson correlation coefficient is less than zero, then the two variables are considered negatively correlated.Finally, if the value of the Pearson correlation coefficient is equal to 0, then the variables X and Y are uncorrelated [33].

Linear Regression
Regression is a technique that is either used to solve forecasting problems or in statistical analysis to determine the linear relationship between dependent and independent variables.The regression models aim to predict the value of the dependent variable Y based on the values of the independent variable X.There are two types of linear regression, simple and multivariate linear regression, depending on the size of the independent variables and the complexity of their relationship with the dependent variable.Regression analysis relies on several assumptions concerning the independent and dependent variables.In short, these assumptions state that: (i) the relationship between two variables must be linear, (ii) the observations provided by a sample are independent of each other, and (iii) the provided samples should not contain many outliers [34,35].

Analysis of Variance
ANOVA aims to split the observations provided by a dataset into two parts: systematic and random factors.The former has a statistical influence on the dataset, while the latter does not.The calculated significant p-value suggests that there is at least one pair where the mean difference is considered statistically important.The pair is identified by employing pair-wise multiple comparisons or post hoc tests.Post hoc tests rely on identifying the homogeneity of the variances amongst the different groups.Post hoc tests are used to determine the significant pair or pairs after the respective ANOVA was deemed important [30].
The ANOVA (Analysis of Variance) that follows the calculation of a regression table is called an ANOVA table or regression ANOVA, and is a special type of one-way ANOVA.It tests the null hypothesis that all of the regression coefficients are equal to zero, meaning that there is no significant relationship between the dependent variable and the independent variables in the model.The regression ANOVA table summarizes the sources of variation in the regression model and tests the null hypothesis that the regression model does not explain a significant proportion of the variation in the dependent variable.It does this by comparing the variability explained by the regression model to the variability not explained by the model.

Results
This section elaborates on the results of the measurements from the four stations located in Thessaloniki, Greece.In the next subsection, the descriptive analysis of the measurements of PM 10 and PM 2.5 that were obtained via the Sympnia IoT platform are described, and all measurements were preprocessed via the schemes presented in Sections 2 and 3, respectively.There is also a subsection on statistical analysis, in which two null hypothesis tests were employed.The first suggested that the traffic flow does not impact the levels of PM 10 present in the atmosphere.The second null hypothesis claimed that the temperature and humidity would not influence the concentration levels of the pollutants.The results were obtained by assuming that all variables follow normal distributions.

Results
This section elaborates on the results of the measurements from the four stations located in Thessaloniki, Greece.In the next subsection, the descriptive analysis of the measurements of PM10 and PM2.5 that were obtained via the Sympnia IoT platform are described, and all measurements were preprocessed via the schemes presented in Sections 2 and 3, respectively.There is also a subsection on statistical analysis, in which two null hypothesis tests were employed.The first suggested that the traffic flow does not impact the levels of PM10 present in the atmosphere.The second null hypothesis claimed that the temperature and humidity would not influence the concentration levels of the pollutants.The results were obtained by assuming that all variables follow normal distributions.Figure 4 illustrates the measurements of the concentration levels of PM2.5 and PM10 air pollutants from Station 2. It points out that the concentration levels of both PM2.5 and PM10 were rather low during the period between March and June 2020.This time frame corresponds to the period when the first wave of lockdowns was imposed in Greece.The imposed lockdown reduced the need to move around by vehicles, which led to a reduction in the concentration levels of PM10 and PM2.5.Then, the concentration levels increased again due to the relaxation of the restrictions, according to Figure 4. Figure 4 illustrates the measurements of the concentration levels of PM 2.5 and PM 10 air pollutants from Station 2. It points out that the concentration levels of both PM 2.5 and PM 10 were rather low during the period between March and June 2020.This time frame corresponds to the period when the first wave of lockdowns was imposed in Greece.The imposed lockdown reduced the need to move around by vehicles, which led to a reduction in the concentration levels of PM 10 and PM 2.5 .Then, the concentration levels increased again due to the relaxation of the restrictions, according to Figure 4.

Preliminary Descriptive Analysis of the Measurements
Figure 5 depicts the measurements of the concentration levels of PM 10 air pollutants from Station 2, while the yellow, blue, and green segments correspond to the measurements which were obtained before, during, and after the first wave of the enforced lockdowns.Figure 5 suggests that both noise and outliers were present in the acquired measurements.This is quite noticeable due to the sudden changes in the values.Figure 6 portrays the same measurements after applying the preprocessing techniques.The time series were preprocessed by smoothing and removing the outliers.
Figure 7 illustrates the correlation between the concentration levels of PM 10 and PM 2.5 air pollutants acquired by Station 2 and Station 4. The PM 2.5 is the independent variable, while the PM 10 is the dependent variable.The illustrated results suggest that the concentration levels of the PM 10 air pollutant rose as the values of PM 2.5 air pollutants increased.In other words, the two air pollutants were highly positively correlated, as expected.Figure 5 depicts the measurements of the concentration levels of PM10 air pollutants from Station 2, while the yellow, blue, and green segments correspond to the measurements which were obtained before, during, and after the first wave of the enforced lockdowns.Figure 5 suggests that both noise and outliers were present in the acquired measurements.This is quite noticeable due to the sudden changes in the values.Figure 6 portrays the same measurements after applying the preprocessing techniques.The time series were preprocessed by smoothing and removing the outliers.Figure 5 depicts the measurements of the concentration levels of PM10 air pollutants from Station 2, while the yellow, blue, and green segments correspond to the measurements which were obtained before, during, and after the first wave of the enforced lockdowns.Figure 5 suggests that both noise and outliers were present in the acquired measurements.This is quite noticeable due to the sudden changes in the values.Figure 6 portrays the same measurements after applying the preprocessing techniques.The time series were preprocessed by smoothing and removing the outliers.Figure 7 illustrates the correlation between the concentration levels of PM10 and PM2.5 air pollutants acquired by Station 2 and Station 4. The PM2.5 is the independent variable, while the PM10 is the dependent variable.The illustrated results suggest that the concentration levels of the PM10 air pollutant rose as the values of PM2.5 air pollutants increased.

Statistical Analysis for the Measurements of Station 2
The available measurements were analyzed using various techniques.The m ments of PM2.5 and PM10 were highly correlated, as depicted in Figure 7.There decided to include only measurements of PM10 for the statistical analysis.The s analysis which follows does not include the measurements of Stations 1 and 3 there were no available data of traffic flow to calculate the desired correlation.In section, Station 2 is analyzed and the first null hypothesis testing was applied to t urements of PM10 air pollutants obtained by Station 2. The descriptive statistics independent T-test were employed in the preprocessed measurements, and the re depicted in Tables 3 and 4. Table 3 indicates the values of mean, standard deviatio ard error, and coefficient of variation for the two groups (before and after tr strictions).Observing Table 4, there is an indication that the p-value was rather l as a result the null hypothesis, which suggests that the traffic flow does not im concentration levels of PM10 air pollutants, can be rejected.

Statistical Analysis for the Measurements of Station 2
The available measurements were analyzed using various techniques.The measurements of PM 2.5 and PM 10 were highly correlated, as depicted in Figure 7. Therefore, we decided to include only measurements of PM 10 for the statistical analysis.The statistical analysis which follows does not include the measurements of Stations 1 and 3 because there were no available data of traffic flow to calculate the desired correlation.In this subsection, Station 2 is analyzed and the first null hypothesis testing was applied to the measurements of PM 10 air pollutants obtained by Station 2. The descriptive statistics and the independent T-test were employed in the preprocessed measurements, and the results are depicted in Tables 3 and 4. Table 3 indicates the values of mean, standard deviation, standard error, and coefficient of variation for the two groups (before and after traffic restrictions).Observing Table 4, there is an indication that the p-value was rather low, and as a result the null hypothesis, which suggests that the traffic flow does not impact the concentration levels of PM 10 air pollutants, can be rejected.The correlation between PM 10 and traffic flow was also calculated via the Pearson correlation coefficients.The high correlation coefficients between the measurements of PM 10 air pollutants and the vehicle flow are presented in Table 5, and there is a strong positive correlation between the two variables.
Tables 6 and 7 portray the results which emerged after applying linear regression and ANOVA.The null hypothesis H0 suggested that the vehicle flow does not impact the concentration levels of PM 10 , while the H1 represented the alternative hypothesis, which states that there is a linear relationship between the dependent variable (PM 10 ) and the independent variable (number of vehicles).The regression statistics in Table 6 indicate that the RMSE value of the null hypothesis H0 is rather high, meaning that the hypothesis that the traffic flow does not impact the concentration levels is false and should be discarded.On the other hand, the extremely low RMSE and high R values of the alternate hypothesis H1 indicate that there is a perfect linear relationship between the two variables.Table 7 shows the degrees of freedom (df), the sum of squares, the mean square, the F statistic, and the overall significance of the regression model for the alternate hypothesis H1.The F statistic is calculated as the regression mean square/residual mean square.This statistic indicates whether the regression model provides a better fit to the data than the model that contains no independent variables.According to Table 7, the p-value is less than 0.001, so there is sufficient evidence to conclude that the regression model fits the data better than the model with no predictor variable.Lastly, Table 8 presents the overall significance of the regression model, and the low p-value which corresponds to the null hypothesis H1 suggests that, similar to the results presented in Tables 6 and 7, the hypothesis that claims that the traffic flow influences the levels of the PM 10 is true, and therefore should be accepted.In this subsection, the measurements of Station 4 are analyzed.While in the case of Station 2 we only had data for the PM 10 and vehicle traffic, in the case of Station 4 we also had data for the humidity and temperature of the environment.Based on this fact, we performed the same evaluation testing but this time we evaluated the dependent parameter (PM 10 ) against the three independent parameters (vehicle flow, humidity, and temperature).The measurements for temperature were obtained from May to July 2020, as illustrated in Figure 8. Figure 8 also illustrates the scatter plots created from calculating the correlation between the concentration levels of the PM 10 air pollutant and the traffic flow.The illustrated results indicate that the acquired measurements of PM 10 air pollutants were strongly correlated with traffic flow and weakly correlated with temperature.

Statistical Analysis for the Measurements of Station 4
In this subsection, the measurements of Station 4 are analyzed.While in the case of Station 2 we only had data for the PM10 and vehicle traffic, in the case of Station 4 we also had data for the humidity and temperature of the environment.Based on this fact, we performed the same evaluation testing but this time we evaluated the dependent parameter (PM10) against the three independent parameters (vehicle flow, humidity, and temperature).The measurements for temperature were obtained from May to July 2020, as illustrated in Figure 8. Figure 8 also illustrates the scatter plots created from calculating the correlation between the concentration levels of the PM10 air pollutant and the traffic flow.The illustrated results indicate that the acquired measurements of PM10 air pollutants were strongly correlated with traffic flow and weakly correlated with temperature.The results in Table 9, also, point out that the concentration levels of PM 10 air pollutants and the vehicle flow are strongly correlated according to the high Pearson correlation coefficient.The low coefficient values between the PM 10 and temperature, and between the PM 10 and humidity levels indicate that the concentration levels of the air pollutant are weakly correlated to the temperature and humidity.Finally, there is an expected negative correlation between humidity and temperature.Tables 10 and 11 present the results after employing linear regression and ANOVA for the measurements of Station 4. In this case, the null hypothesis H0 suggests that the independent variables do not impact the concentration levels of the PM 10 , while H1 represents the alternative hypothesis, which states that there is a linear relationship between the dependent variable and the three independent variables.Table 10, presenting the regression statistics, suggests accepting the alternative hypothesis H1 since the R-value is high and the RMSE value is extremely low.Table 11 depicts the overall significance of the regression model in the case of H1, and the value of the df is 3 since the independent variables are 3, in contrast to Station 2. The low p-value confirms that there is sufficient evidence to conclude that the regression model fits the data better than the model with no predictor variables.If none of the predictor variables in the model are statistically significant, the overall F statistic would also not be statistically significant.The analysis of each coefficient is important to assess the impact of the three independent variables in the dependent variable, and therefore Table 12 presents the overall significance of the regression model.Observing the table, the value of the significance (p-value) for the temperature and the humidity is high.On the other, the t-value is extremely high and the p-value low enough for the number-of-vehicles variable.As a result, we conclude that only the number of vehicles is linearly correlated with the dependent variable PM 10 .

Conclusions
The concentration levels of air pollutants such as PM 10 and PM 2.5 are constantly increasing in urban areas, and recent studies indicate negative impacts on our health.This paper sought to investigate the relationship and, in particular, the correlation and linearity between the concentration levels of PM 10 , traffic flow, temperature and humidity.The measurements were obtained from several IoT devices, placed in stations in Thessaloniki, Greece.The city of Thessaloniki was selected because its infrastructure is similar to that of most European cities.The period of acquisition corresponded to the period before, during, and after the first wave of COVID-19 lockdowns.Based on the plots of the preliminary descriptive analysis of the measurements of PM levels for all stations, it is indicated that: 1.
Surrounding conditions affect the concentration levels of PM, since Station 2-which is located in one of the busiest streets-indicated higher levels.
2. PM 2.5 and PM 10 variables are highly correlated.

3.
There is a positive influence of traffic flow on the concentration of PM, since the period of the COVID-19 pandemic affected its levels.
The obtained measurements were further processed prior to further statistical analysis.More specifically, the outliers and noise were removed from the data.The inference analytics of the measurements were described, and the correlation was evaluated by employing two null hypotheses.One suggested that the traffic flow does not impact the concentration of PM 10 in the atmosphere, while the other hinted that the factors regarding the stations' temperature and humidity do not influence the air pollutants' concentration levels.The statistical analysis of the measurements of PM 10 for Station 2 and Station 4 indicated that: 1.
PM 10 and vehicle flow are strongly correlated variables.

2.
Temperature and humidity are weakly correlated with PM 10 compared to the correlation mentioned above.
Regarding future directions of research, other preprocessing techniques could be applied to remove any outliers and noise present in the obtained measurements.Additionally, the data from more stations should be utilized to measure the air pollutant concentrations and average the results for an entire city or region.The data should be collected using a network of sensors located throughout the area.It is important to measure other air pollutants, such as O 3 , NO 2 , CO, and SO 2 .Towards the direction of making the Earth more sustainable, additional environmental factors such as atmospheric dispersion conditions, and especially wind speed, the presence of temperature inversion, and the mixing layer height, should be measured to investigate their impact on the concentration levels of various air pollutants.

Figure 1 .
Figure 1.The distribution of the stations in the city of Thessaloniki, with scale and spatial orientation.

Figure 1 .
Figure 1.The distribution of the stations in the city of Thessaloniki, with scale and spatial orientation.

Figure 3
Figure 3 illustrates the time series of the concentration levels of PM 2.5 air pollutants acquired from the four stations.The illustrated measurements were acquired from February to August 2020.The bottom-right figure in Figure 3 suggests that there is a period with missing data for Station 4 from June to July 2020.The high levels of PM 2.5 air pollutants present in the top-right figure suggest that Station 2 is located on one of the busiest streets in Thessaloniki.

Figure 3
Figure 3 illustrates the time series of the concentration levels of PM2.5 air pollutants acquired from the four stations.The illustrated measurements were acquired from February to August 2020.The bottom-right figure in Figure 3 suggests that there is a period with missing data for Station 4 from June to July 2020.The high levels of PM2.5 air pollutants present in the top-right figure suggest that Station 2 is located on one of the busiest streets in Thessaloniki.

Figure 3 .
Figure 3.The PM 2.5 measurements of the four stations.

Figure 4 .
Figure 4.The PM2.5 and PM10 measurements of the second station.

Figure 5 .
Figure 5.The raw time series of PM10 at the second station.

Figure 4 .
Figure 4.The PM 2.5 and PM 10 measurements of the second station.

Figure 4 .
Figure 4.The PM2.5 and PM10 measurements of the second station.

Figure 5 .
Figure 5.The raw time series of PM10 at the second station.Figure 5.The raw time series of PM 10 at the second station.

Figure 5 . 16 Figure 6 .
Figure 5.The raw time series of PM10 at the second station.Figure 5.The raw time series of PM 10 at the second station.Sustainability 2023, 15, x FOR PEER REVIEW 10 of 16

Figure 6 .
Figure 6.The preprocessed time series of PM 10 at the second station.

Figure 7
Figure7illustrates the correlation between the concentration levels of PM10 a air pollutants acquired by Station 2 and Station 4. The PM2.5 is the independent while the PM10 is the dependent variable.The illustrated results suggest that the tration levels of the PM10 air pollutant rose as the values of PM2.5 air pollutants in In other words, the two air pollutants were highly positively correlated, as expec

Figure 7 .
Figure 7.The correlation between the measurements of PM2.5 and PM10 air pollutants.

Figure 7 .
Figure 7.The correlation between the measurements of PM 2.5 and PM 10 air pollutants.

Figure 8 .
Figure 8.The plots for Station 4.Figure 8.The plots for Station 4.

Figure 8 .
Figure 8.The plots for Station 4.Figure 8.The plots for Station 4.

Table 1 .
[28]technical specifications of the second version of the Sympnia Smart Collector[28].

Table 3 .
Descriptive statistical analysis of PM 10 air pollutants.

Table 4 .
Results obtained after the Independent T-test.

Table 5 .
Results obtained after calculating the Pearson Correlation for Station 2.

Table 6 .
Regression statistics for the measurements of Station 2.

Table 7 .
ANOVA for the measurements of Station 2.

Table 8 .
The overall significance of the regression model for the measurements of Station 2.

Table 9 .
Results obtained after calculating the Pearson Correlation for Station 4.

Table 10 .
Regression statistics for the measurements of Station 4.

Table 11 .
ANOVA for the measurements of Station 4.

Table 12 .
The overall significance of the regression model for the measurements of Station 4.