3.1. Comparison with the Reference Instrument in Rabka-Zdrój
The comparative measurements of the two low-cost sensors (C1 and C2) with the reference instrument lasted through a period of varied conditions, in both meteorological and PM concentrations (
Table 1,
Table 2 and
Table 3); that is the reason why the presented analysis was made for particular months (February–June 2017) separately. The graphical comparison of the raw measurements from the low-cost sensors and reference instrument (RIEP) can be found in
Figure 2. The reference instrument measured only the PM
10 fraction; therefore, the analysis omitted the remaining fractions measured by the low-cost sensors.
Table 1 presents daily minimum, maximum, and average temperature and relative humidity in particular months, obtained from the RIEP station. The average daily temperature ranged from −2.5 °C (25 February 2017) to over +20 °C (30 May 2017), and the daily average relative humidity belonged to the range of 52% (2 June 2017) to 98% (25 May 2017). For the colder months (February, March, and April), it was usually closer to 80%, while for the warmer months (May and June), about 60%.
Table 2 and
Table 3 show the values of some of the statistical parameters describing the raw PM
10 measurements made by both the low-cost devices, in particular months, and for the whole measurement period.
The collected results show a high correlation between the raw PM
10 measurements from the low-cost devices and measurements from the reference instrument. Pearson’s correlation coefficient ranges from
r = 0.79 to
r = 0.97 for device C1, while for device C2, from
r = 0.77 to
r = 0.91 (and is similar to results presented, e.g., in [
13]). For both devices, this is a positive correlation. The highest
r values were observed in months with the highest PM
10 concentrations, while for April, May, and June the correlation was slightly weaker. It is worth mentioning that, for the worst low-cost sensors, the determination coefficient
R2 was below 0.2. An example of such a case is the comparative measurements conducted within EuNetAir in Portugal [
27]. Considering the values obtained in that study (
R2 from 0.13 to 0.36 for sensors Shinyei ppd42 and Shinyei PPD20V), the sensors used in Rabka-Zdrój were much better (in case of correlation). It should be remembered that climatic conditions were also much more diverse in Rabka-Zdrój than in Portugal.
In the vast majority of cases, the tested sensors overestimated the measuring values. The measured concentrations were usually 30–50% higher than these from the reference instrument. Device C2 was characterized by a little higher stability, for which the concentrations in February, March, and May were on average 34–36% higher than in the case of the reference instrument. In April, the average measurement results were higher by 50%. For device C1, the measured values were on average 38–48% higher than these from the reference instrument. Much higher values were obtained only in February—over 80%. It turns out that the reason for this much larger deviation, compared to the remaining months, are the days in the period 15–20 February, when the sensor indicated values twice higher than the reference instrument.
The tendency to overestimate the measured concentrations indicates also other statistical parameters; e.g., small differences between the absolute and relative errors. The mean error values were the highest for the beginning of the period. At the end of spring, both devices overstated the average measurements by no more than 10 μg/m3.
Figure 3 presents the relations between the measurement results (minimum, average, and maximum) from both the low-cost sensors and the reference instrument. The measurement range was divided into seven intervals, each of which covered 5 μg/m
3. The minimum concentration of PM
10 in the analyzed measurement period was 10 μg/m
3, while 44 μg/m
3 was assumed as the upper limit since only individual values (24-h averages) were recorded above it.
Figure 3 does not include values that exceed these levels, so as not to impair the readability of the illustrations.
For PM
10 concentrations in the range from 10 μg/m
3 to 44 μg/m
3,
Figure 3 indicates a nearly linear dependence between measurements from both sensors and the reference instrument. A slight increase was noted in the low-cost sensors’ measurement overestimation at higher (over 30 μg/m
3) PM
10 concentrations. In particular, for the analyzed intervals, the minimum values indicated by the low-cost sensors are very close to the average values measured by the reference instrument. The maximum values, in turn, significantly exceed the values from the reference instrument.
Considering the entire measurement period, both devices are characterized by high correlation coefficients. The measurement errors, in particular the absolute ones, are unfortunately also high, which is mainly due to the high concentrations observed during the cold period. The high value of the correlation coefficient and quite similar behavior of both sensors makes it possible to potentially use a correction function that will minimize the measurement errors. Determining the effective correction function is possible due to the fact that the training data set is quite extensive (242 daily averages), statistically significant, and includes various meteorological conditions (from winter to summer, in a temperate climate).
Practical observations pointed out that the percentage deviations between the measurement results from the low-cost sensors and the reference instrument were greater during higher PM concentrations. This hypothesis was confirmed by the results of the analysis, in which it turned out that, in the case of the tested sensors, the best fit (from linear, exponential, logarithmic, polynomial, and power correlation) gives a 2nd-degree polynomial correlation. The regression and correlation coefficients are presented in
Table 4 and
Figure 4.
In an attempt to identify the factors affecting significant overestimation of the measurement results from the low-cost sensors, the relationship of the deviations in PM10 concentrations measured by the low-cost sensors from the reference instrument was analyzed, taking into account the most important meteorological parameters.
Table 5 and
Table 6 present the Pearson’s correlation coefficients between the deviations in PM
10 concentrations and temperature, and the deviations in PM
10 concentrations and relative humidity in particular months of the measurement period. All of these correlation coefficients are statistically significant with
p-values less than 0.01. The 24-h averages of the appropriate values were adopted for the calculations.
The results indicate that the degree of overestimation or underestimation of the PM
10 measurements is related to some meteorological parameters. This phenomenon is particularly strong in case of relative humidity.
Figure 5 shows the deviations in the measurement values from the two low-cost sensors compared to the reference instrument, depending on the relative humidity.
It was a fairly moderate (from rather low to relatively high) positive correlation (
r = 0.28–0.67), when comparing deviations in the measured PM
10 concentrations to the measurement results from the reference instrument with relative humidity. In general, the correlation coefficients were higher (up to
r = 0.67) for colder months, where a high relative humidity (over 90%) occurred more often, and lower (
r = 0.33–0.55) for warmer months, where the relative humidity was slightly lower (often below 60%). However, even then, for a high relative humidity, concentrations similar to the measurements results from the RIEP station occurred. This phenomenon may result from the fact that, in the case of high humidity, water droplets (for example from fog) floating in the air may be treated as aerosol particles [
28]. In this situation, when absorbing air into the sensor, they cause (like solid particles) light scattering, so the device can treat them as pollutants. Days with high relative humidity in the Polish climate occur more often in the cold part of the year; therefore, this moves into a correlation between sensor readings deviations and temperature—this is presented in
Figure 6. In this case, correlation coefficients are smaller, because low temperatures do not always have an impact on high humidity, and the temperature itself should not have a significant impact on the change in the deviations of the low-cost sensor measurements.
It can be pointed out that a high humidity and low temperature (often occurring together) may affect the overestimation of the optical sensors’ measurements (sometimes even more than twice). Conditions that favor underestimation include low humidity and high temperature.
The high correlation of the results means that with a large sample of data collected in various meteorological conditions, a fairly high consistency with the results from the reference instrument can be obtained with a correction function. A simple correction of the results may be based, for example, on the use of multiple regression, in which in the correction function of the measured PM10 concentrations and the meteorological parameters with the greatest impact on the results—relative humidity—will be taken into account.
In order to determine the correction function, the analysis was carried out in two steps: First, with a 2nd-degree polynomial correlation, as the one for which the obtained agreement was the highest, the quadratic equation was determined, in which the average 24-h PM
10 concentrations from the low-cost sensors were treated as variables (Equation (1)). Next, the determined numerical values together with the results of the relative humidity measurements were applied in a multiple linear regression (Equation (2)) so that the final result depends on the corrected PM
10 concentration and relative humidity (Equation (3)).
where:
P—measured PM10 concentration by low-cost sensors (μg/m3);
H—measured relative humidity by low-cost devices (%);
P’—recalculated value of PM10 concentration for low-cost sensors without relative humidity (μg/m3);
PC—recalculated value of PM10 concentration for low-cost sensors with relative humidity (μg/m3).
After recalculating the measurement results according to Equation (3), the statistical parameters for both devices take the values as shown in
Table 7 and
Table 8.
After recalculating the measurement results using the correction function, the obtained results turn out to be much more similar to the results obtained in the measurements made with the reference instrument; this also resulted in a decrease of all errors. An important aspect is the size of the test sample. In the analyzed case, there were measurement results from the 4-month period, covering different seasons, starting from winter and ending almost at the beginning of summer.
Based on this training set, research is also being carried out to prove the equivalence (or conditions necessary to meet them) of measurements made using the low-cost PM sensors in relation to the reference methods. The results are presented in [
23,
29].
3.2. Verification of the Correction Function
To verify the correction function, two identical devices with the new low-cost PM DFRobot sensors (named S1 and S2) were installed close enough to the reference air quality monitoring station (less than 10 m) in Nowy Sącz, in January 2018. It can be considered that these low-cost devices and the professional instrument operated in the same environment. The purpose of these devices was to verify the compliance of the measurement results with the concentrations observed by the reference instrument. The results from both the low-cost sensors were recalculated using the correction function determined on the basis of previous long-term measurements carried out in Rabka-Zdrój. The presented results include data from February to June 2018. Some statistical parameters are presented in
Table A1,
Table A2,
Table A3,
Table A4 and
Table A5 and in
Figure 7,
Figure 8, and
Figure A1.
February 2018 was characterized by relatively low air temperatures, which contributed to the increase in PM10 concentrations; this, in turn, also implied a significant over-estimation by both low-cost sensors. Values were higher by almost 50% in case of sensor S1 and almost 60% in case of sensor S2. The correlation coefficients for the two low-cost sensors and the RIEP station were very high—over 0.98. The largest differences between the concentrations measured in the reference station and the low-cost device were observed on 7 February. This day was characterized by one of the highest daily PM10 concentration: 141 μg/m3.
After applying the correction function, a significant improvement in the quality of PM
10 concentrations from both low-cost sensors was obtained. The correlation coefficients were still at a very high level, while the greatest improvement was observed in the case of percentage deviations and differences in absolute values (mean errors and percentage errors reached values below unity or slightly below zero). The average absolute percentage error was 9–12%, and deviations in relation to the concentrations measured at the reference station ranged from 17 μg/m
3 to 20 μg/m
3. There was also a significant reduction in concentration overdrafts in days with high PM
10 concentrations (7, 12, and 13 February), which is presented in
Figure A1. The achievement of the desired effect of the correction function was undoubtedly due to the fact that it was determined on the basis of a comprehensive data set, taking into account a relatively long measurement period carried out under different atmospheric conditions.
At the beginning of March, there were also very high concentrations (higher than in February), but in this case, the fit was not as good as in case of high concentrations in February. After applying the correction function, both sensors significantly underestimated the measured PM10 concentrations. The situation returned to the regular one after 5 March. Thus, there appeared a question, how these days were different compared to others, in which high concentrations were also recorded. When analyzing the basic meteorological parameters (i.e., relative humidity, temperature, wind speed, and direction), one can observe that during the first days of March the temperatures were very low, reaching minima below −20 °C in the night and an average temperature below −15 °C (e.g., 1 March). In February, with the local maximum daily concentrations, the temperature usually fluctuated around zero (with indication of positive values). Then, the overestimation of the low-cost sensors even doubled. In case of the March maxima, the initial overestimation was only around 20–40% compared to the RIEP. The correction function reduced the values for the S1 and S2 sensors by almost half, which in case of the February, overshoots quite well the approximated adjusted value for the RIEP measurements, and for the exceedances of the first days of March; unfortunately, it caused a quite large undervaluation.
It seems that the reason for this was the humidity. As has been shown before, the optical sensors are affected by humidity, because small droplets of floating water, e.g., from fog, cause light scattering similar to PM particles. In case of high concentrations in February, there was a high relative humidity, and the temperature was close to 0 °C. At the beginning of March, the relative humidity was also quite high, but at a significantly lower temperature the actual number of water droplets in the same unit of air volume was significantly lower, compared to that with the same relative humidity but with the temperature almost 20 degrees higher (around 0 °C). A smaller number of water droplets probably resulted in less overestimation of the sensor. One can risk the hypothesis that the correction of the sensor readings should be based on absolute humidity. This would probably reduce the overestimation scale. Another potential solution would be to combine the corrective function with the temperature, or possibly including this parameter in a correction function below a certain temperature limit. The suitability of these approaches will be verified in further analysis.
In April and May, the correlation coefficients based on the raw measurements were slightly lower than in the two previous months. The application of the corrective function was therefore likely to bring an improvement, and this happened, especially in case of May, when a significant improvement in convergence resulted in the inclusion of meteorological conditions for both sensors. In these two months, the values of the measurement errors, in particular the absolute ones, significantly improved. In the end, their monthly value fluctuated around 20%, and the average errors below 6%.
In June, after applying the correction function, the values of the correlation coefficients also improved. The sensors, however, began to underestimate their values. The mean values of the absolute and percentage errors also increased. Perhaps the reason for this was the period on the basis of which the form of the corrective function was determined—cooler days with higher concentration values prevailed there—and also the fact that, in the warmer period, the PM concentrations in Poland are generally lower than in winter.
In most cases, applying the corrective function, from the initial tendency to overestimate the measurement values, led to underestimation. This is evidenced by the positive mean error for values without correction and then negative after applying the corrective function. The only exception was May for Sensor S2. In most cases, the error was at the level of a few μg/m3, especially for February, May, and partially April. The worst case was in March, when the error values were close to 10 μg/m3. This regularity is also visible when comparing the maximum daily deviations above and below the values from the reference measurement devices. For the raw data, it is possible to shift this range towards significant revaluations, and after applying the correction, the center of this range moves towards zero or in a few cases it takes a negative value.
It is also worth paying attention to the issue of the error, depending on taking into account the relative humidity in the correction function. For a definite improvement in the accuracy of the results, it is enough to use the correction function without taking into account the variability of humidity, thanks to which it is possible to improve the results by several dozen percent. The use of the extended form of the correction function slightly improves some of the indicators, but the added improvement is not so significant.
Using the correction functions in the analyzed cases resulted in a significant improvement and led to getting acceptable quality levels of the source measurement results. Taking into account additional meteorological parameters (such as relative humidity, and perhaps wind speed, which will be the subject of further analysis) may bring even better improvement in the measurement results of low-cost devices in relation to results from reference stations.