Evaluation of Low-Cost Sensors for Weather and Carbon Dioxide Monitoring in Internet of Things Context

In a context of increased environmental awareness, the Internet of Things has allowed individuals or entities to build their own connected devices to share data about the environment. These data are often obtained from widely available low-cost sensors. Some companies are also selling low-cost sensing kits for in-house or outdoor use. The work described in this paper evaluated, in the short term, the performance of a set of low-cost sensors for temperature, relative humidity, atmospheric pressure and carbon dioxide, commonly used in these platforms. The research challenge addressed with this work was assessing how trustable the raw data obtained from these sensors are. The experiments made use of 18 climatic sensors from six different models, and they were evaluated in a controlled climatic chamber that reproduced controlled situations for temperature and humidity. Four CO2 sensors from two different models were analysed through exposure to different gas concentrations in an indoor environment. Our results revealed temperature sensors with a very high positive coefficient of determination (r2 ≥ 0.99), as well as the presence of bias and almost zero random error; the humidity sensors demonstrated a very high positive correlation (r2 ≥ 0.98), significant bias and small-yet-relevant random error; the atmospheric pressure sensors presented good reproducibility, but further studies are required to evaluate their accuracy and precision. For carbon dioxide, the non-dispersive infra-red sensors demonstrated very satisfactory results (r2 ≥ 0.97, with a minimum root mean squared error (RMSE) value of 26 ppm); the metal oxide sensors, despite their moderate results (minimum RMSE equal to 40 ppm and r2 of 0.8–0.96), presented hysteresis, environmental dependence and even positioning interference. The results suggest that most of the evaluated low-cost sensors can provide a good sense of reality at a very good cost–benefit ratio in certain situations.


Introduction
During the last years, collaborative sensing, a concept well described by the authors in [1][2][3], has been used in several application fields due to paradigm-breaking features such as decentralization, the possibility of enhancing the space-time granularity of a sensing system, reduced costs, and its capability of giving users the power to be a node and become part of a solution to a shared concern or common problem. In short, the main point in collaborative sensing is the use of low-cost devices handled by "non-professional" individuals-citizen scientists-in large quantities to overcome costs and to increase the density of nodes, forming a dense, and even pervasive, monitoring system.
IoT 2020, 1 289 their own "IoT weather station" (such sensors are often sold as "do-it-yourself weather stations" by well-known online vendors) without any further refined data processing, such as machine learning. It is important to highlight that the sensors used in this study were chosen without commercial bias, trying to include a good diversity of brands and models. The tested environmental sensors include temperature and relative humidity sensors, evaluated in controlled conditions, and air pressure and carbon dioxide sensors, evaluated in natural conditions. The performance indicators were obtained by statistically analysing the sensor readings against reference readings, for temperature, humidity and carbon dioxide, and between sensor pairs, for the atmospheric pressure sensors.

Materials and Methods
The research approach included a procedure for selecting the sensors for analysis, the planning of the experiments, data collection, data analysis and the discussion of the obtained results, as discussed in the following sections.

Sensors
The sensors were selected using criteria that an individual would have when engaging in an environmental monitoring project on the Internet of Things. The sensors were chosen considering availability (available in known vendor websites; reachable by a simple internet search), price, ease of use (sensors built in ready-to-use interface boards for Arduino) and nominal parameters suitable for urban spaces with no extreme conditions (trying to cover an "average" situation). In addition to these criteria, the selection of sensors did not favour any specific commercial brand. The selected sensors for climatic variables (temperature, humidity and pressure) are presented in Table 1. The selected sensors for carbon dioxide are presented in Table 2. Three units of each climatic sensor were used, as well as two units of each carbon dioxide sensor. 1 T-Temperature; H-Humidity; P-Pressure. 2 Prices as seen in 2nd half of 2020.  3 Prices as seen in 2nd half of 2020.
The climatic sensors the carbon dioxide sensors used in this study are illustrated in Figure 1. The reference sensor used for temperature and humidity was the Lascar Electronics EL-USB-2, with a certificate of calibration available in [36]. The reference sensor used for carbon dioxide was the Vaisala GM70 with a CO 2 probe. Both reference instruments are illustrated in Figure 2.

Performance Metrics
The sensors were evaluated considering three main metrics: accuracy, precision and trueness [37]. The accuracy assessment of the sensors was performed by the analysis of the mean error (ME) and the root mean squared error (RMSE) at different measurand levels. While the mean error provides information regarding the sensor bias, the root mean square error provides a more comprehensive view regarding the sensor accuracy.
As the precision is defined by the statistical dispersion of the measurements of a quantity in similar conditions, the standard deviation computed from each sensor's readings at stable and reproductible levels was used as a precision metric. Additionally, to illustrate the impact of this dispersion at each stable level, the signal-to-noise ratio (SNR, or the inverse of the variation coefficient, cv −1 ) was also considered. This parameter, in turn, was computed from the averaged readings of a sensor divided by its standard deviation at that measurand level. A precise sensor would present a standard deviation near to zero and a very high numerical value for the SNR (cv −1 ).
The information regarding the trueness of the sensors was obtained in terms of the determination coefficient (r 2 ) and the analysis of the residuals in the linear relationship between all the observed and expected values during the quantity variations. The expected output for a hypothetical ideal sensor would be a straight line from the bottom-left to top-right corner, with a y = x relationship. To estimate this metric, we applied a residuals calculation to the linear relationships. For convenience, this metric is referred to as the dynamic residuals (DR) hereafter, and is defined by Equation (1) (with the boundaries determined by the temperature experimental limits): where (x) is the averaged linear model output of a given sensor; y(x) is the expected, or ideal, output (y = x); Tf is the final temperature; Ti is the initial temperature; and ΔT is the temperature variation.

Performance Metrics
The sensors were evaluated considering three main metrics: accuracy, precision and trueness [37]. The accuracy assessment of the sensors was performed by the analysis of the mean error (ME) and the root mean squared error (RMSE) at different measurand levels. While the mean error provides information regarding the sensor bias, the root mean square error provides a more comprehensive view regarding the sensor accuracy.
As the precision is defined by the statistical dispersion of the measurements of a quantity in similar conditions, the standard deviation computed from each sensor's readings at stable and reproductible levels was used as a precision metric. Additionally, to illustrate the impact of this dispersion at each stable level, the signal-to-noise ratio (SNR, or the inverse of the variation coefficient, cv −1 ) was also considered. This parameter, in turn, was computed from the averaged readings of a sensor divided by its standard deviation at that measurand level. A precise sensor would present a standard deviation near to zero and a very high numerical value for the SNR (cv −1 ).
The information regarding the trueness of the sensors was obtained in terms of the determination coefficient (r 2 ) and the analysis of the residuals in the linear relationship between all the observed and expected values during the quantity variations. The expected output for a hypothetical ideal sensor would be a straight line from the bottom-left to top-right corner, with a y = x relationship. To estimate this metric, we applied a residuals calculation to the linear relationships. For convenience, this metric is referred to as the dynamic residuals (DR) hereafter, and is defined by Equation (1) (with the boundaries determined by the temperature experimental limits): where (x) is the averaged linear model output of a given sensor; y(x) is the expected, or ideal, output (y = x); Tf is the final temperature; Ti is the initial temperature; and ΔT is the temperature variation.

Performance Metrics
The sensors were evaluated considering three main metrics: accuracy, precision and trueness [37]. The accuracy assessment of the sensors was performed by the analysis of the mean error (ME) and the root mean squared error (RMSE) at different measurand levels. While the mean error provides information regarding the sensor bias, the root mean square error provides a more comprehensive view regarding the sensor accuracy.
As the precision is defined by the statistical dispersion of the measurements of a quantity in similar conditions, the standard deviation computed from each sensor's readings at stable and reproductible levels was used as a precision metric. Additionally, to illustrate the impact of this dispersion at each stable level, the signal-to-noise ratio (SNR, or the inverse of the variation coefficient, cv −1 ) was also considered. This parameter, in turn, was computed from the averaged readings of a sensor divided by its standard deviation at that measurand level. A precise sensor would present a standard deviation near to zero and a very high numerical value for the SNR (cv −1 ).
The information regarding the trueness of the sensors was obtained in terms of the determination coefficient (r 2 ) and the analysis of the residuals in the linear relationship between all the observed and expected values during the quantity variations. The expected output for a hypothetical ideal sensor would be a straight line from the bottom-left to top-right corner, with a y = x relationship. To estimate this metric, we applied a residuals calculation to the linear relationships. For convenience, this metric is referred to as the dynamic residuals (DR) hereafter, and is defined by Equation (1) (with the boundaries determined by the temperature experimental limits): IoT 2020, 1 291 whereŷ(x) is the averaged linear model output of a given sensor; y(x) is the expected, or ideal, output (y = x); T f is the final temperature; T i is the initial temperature; and ∆T is the temperature variation. The interpretation of this parameter is trivial: it is the root mean squared distance between two lines in a defined range, and the closer to 0.0 the value of DR, the better the performance of the corresponding sensor.

Experimental Protocol
A brief description of the experiments for each group of sensors and equipment used, as well as the reference instruments and expected outcomes, is presented in the following subsections. This description is provided here mostly to enable the reproducibility of these experiments. The experiments were conducted in two stages: one for climatic sensors (Section 2.3.1) and another one for carbon dioxide sensors (Section 2.3.2).

Climatic Sensor Experiment
The experiment for climatic sensors was conducted inside a controlled climatic chamber (Aralab© Fitoclima ® ) reproducing combinations of temperature and humidity. Two experimental profiles were planned. The first experimental profile had four temperature stages-−5, 10, 25 and 40 • C-and three humidity levels applied in each stable temperature stage: 30%, 50% and 80%. Each temperature stage lasted for eleven hours in stability, and one hour in transition to the next level. Considering that the chamber was unable to control humidity below 0 • C, the second profile was programmed only with positive temperatures-each one with three humidity levels: 30%, 50% and 80%. Each combination of temperature and humidity lasted for five hours after stabilization, each humidity level change lasted for thirty minutes, and the temperature level changes lasted for one hour. Both experimental profiles were executed three times, containing two sets of sensors at a time.
Each sensor group was formed by six different sensor units and was wired into an Arduino Uno device, powered by an external and independent power supply. The Arduino device also had an SD shield attached to it for data logging, and the RTC (Real-Time Clock) service available for time synchronization. The sampling rate was set to one sample per minute. Figure 3 shows a group of climatic sensors (temperature, relative humidity and atmospheric pressure) wired into an Arduino outside and inside the climatic chamber. The interpretation of this parameter is trivial: it is the root mean squared distance between two lines in a defined range, and the closer to 0.0 the value of DR, the better the performance of the corresponding sensor.

Experimental Protocol
A brief description of the experiments for each group of sensors and equipment used, as well as the reference instruments and expected outcomes, is presented in the following subsections. This description is provided here mostly to enable the reproducibility of these experiments. The experiments were conducted in two stages: one for climatic sensors (Section 2.3.1) and another one for carbon dioxide sensors (Section 2.3.2).

Climatic Sensor Experiment
The experiment for climatic sensors was conducted inside a controlled climatic chamber (Aralab© Fitoclima ® ) reproducing combinations of temperature and humidity. Two experimental profiles were planned. The first experimental profile had four temperature stages-−5, 10, 25 and 40 °C-and three humidity levels applied in each stable temperature stage: 30%, 50% and 80%. Each temperature stage lasted for eleven hours in stability, and one hour in transition to the next level. Considering that the chamber was unable to control humidity below 0 °C, the second profile was programmed only with positive temperatures-each one with three humidity levels: 30%, 50% and 80%. Each combination of temperature and humidity lasted for five hours after stabilization, each humidity level change lasted for thirty minutes, and the temperature level changes lasted for one hour. Both experimental profiles were executed three times, containing two sets of sensors at a time.
Each sensor group was formed by six different sensor units and was wired into an Arduino Uno device, powered by an external and independent power supply. The Arduino device also had an SD shield attached to it for data logging, and the RTC (Real-Time Clock) service available for time synchronization. The sampling rate was set to one sample per minute. Figure 3 shows a group of climatic sensors (temperature, relative humidity and atmospheric pressure) wired into an Arduino outside and inside the climatic chamber.  Between executions, the generated data were extracted from the SD cards, and one set of sensors was replaced (one set was kept at the end of each execution). Considering 3 sets of sensors, denominated as "A", "B" and "C", the pairs subjected to the experiment were made by combinations of two out of three, resulting in three executions to which each set of sensors was subjected twice. The same logic applied to the experimental profile focused on the humidity sensors. Between executions, the generated data were extracted from the SD cards, and one set of sensors was replaced (one set was kept at the end of each execution). Considering 3 sets of sensors, denominated as "A", "B" and "C", the pairs subjected to the experiment were made by combinations of two out of three, resulting in three executions to which each set of sensors was subjected twice. The same logic applied to the experimental profile focused on the humidity sensors.
Regarding the atmospheric pressure sensors, their readings reported the natural external conditions during the experiments, since the used chamber did not control its internal pressure, nor was it hermetically sealed.
The evaluated sensors do not differ in terms of the transduction principle for each measurand.

Carbon Dioxide Experiment
Due to the lack of equipment able to control CO 2 levels in a controlled chamber, the experiment to assess these sensors was designed to take advantage of the natural conditions in indoor environments with known human occupancy. The experimental plan for carbon dioxide was to place the sensors together with the reference instrument into these environments. Two different indoor environments were planned: an office room with ventilation control during working hours, an approximated volume of 125 m 3 and random occupancy during the daytime (a minimum of one person and maximum of 10 people at the same time); a bedroom during the night time, without ventilation, and with an approximated volume of 30 m 2 and two occupants during the night time.
The first situation was planned to verify random variations, as people might enter and leave the room at any time. The second situation was planned to verify the sensor response to higher concentrations, since the CO 2 can accumulate over time, especially in the absence of ventilation systems (which is a common situation in many residential buildings), as assessed by the authors in [38]. Figure 4 illustrates one unit of each of the tested sensors and the reference instrument.
IoT 2019, 2 FOR PEER REVIEW 7 Regarding the atmospheric pressure sensors, their readings reported the natural external conditions during the experiments, since the used chamber did not control its internal pressure, nor was it hermetically sealed.
The evaluated sensors do not differ in terms of the transduction principle for each measurand.

Carbon Dioxide Experiment
Due to the lack of equipment able to control CO2 levels in a controlled chamber, the experiment to assess these sensors was designed to take advantage of the natural conditions in indoor environments with known human occupancy. The experimental plan for carbon dioxide was to place the sensors together with the reference instrument into these environments. Two different indoor environments were planned: an office room with ventilation control during working hours, an approximated volume of 125 m 3 and random occupancy during the daytime (a minimum of one person and maximum of 10 people at the same time); a bedroom during the night time, without ventilation, and with an approximated volume of 30 m 2 and two occupants during the night time.
The first situation was planned to verify random variations, as people might enter and leave the room at any time. The second situation was planned to verify the sensor response to higher concentrations, since the CO2 can accumulate over time, especially in the absence of ventilation systems (which is a common situation in many residential buildings), as assessed by the authors in [38]. Figure 4 illustrates one unit of each of the tested sensors and the reference instrument. It is relevant to mention that the carbon dioxide sensors in question use different transduction principles: while the MG-811 is a metal oxide semiconductor (MOS) sensor that needs to heat its sensitive layer, the MH-Z16 uses non-dispersive infra-red (NDIR) to measure the quantity of energy absorbed by CO2 in a specific wavelength. Those sensing mechanisms may imply different performance expectations, as reported in [39,40].

Temperature Sensors
An example of one (out of six) experimental output from the temperature sensors is presented in Figure 5. Both accuracy and precision analysis were performed separately for each stable temperature level for all experiments, and considering all sensor units, by using data slices in the regions of interest. The accuracy metrics obtained by averaging all the execution results for the temperature sensors are presented in Table 3. It is relevant to mention that the carbon dioxide sensors in question use different transduction principles: while the MG-811 is a metal oxide semiconductor (MOS) sensor that needs to heat its sensitive layer, the MH-Z16 uses non-dispersive infra-red (NDIR) to measure the quantity of energy absorbed by CO 2 in a specific wavelength. Those sensing mechanisms may imply different performance expectations, as reported in [39,40].

Temperature Sensors
An example of one (out of six) experimental output from the temperature sensors is presented in Figure 5. Both accuracy and precision analysis were performed separately for each stable temperature level for all experiments, and considering all sensor units, by using data slices in the regions of interest. The accuracy metrics obtained by averaging all the execution results for the temperature sensors are presented in Table 3.  The farther away from zero these values are, the lower the accuracy of the respective sensor is. These numbers demonstrate that HTU21D presented the lowest overall mean error, whilst the BME280 presented the highest. However, variations in the mean error for different temperature levels may indicate a heterogeneous accuracy along the evaluated temperature range. As the exposed values were averaged from individual sensor unit readings from all the executions, the consistency of the error is represented by the standard deviation of the observed mean errors (σME), for which lower values represent more homogeneous mean errors from different sensor units in different experimental executions (e.g., a constant error from all units of a sensor model would result in σME = 0-in other words, an error purely systematic). On this, it can be noted that the most consistent sensor model was the MPL3115A2, followed by HTU21D. The least consistent sensor models were the DS18B20 and the AM2302.
Expanding the considerations to professional requirements, the World Meteorological Organization (WMO) points out an ideal accuracy of ±0.5 °C in its guidelines for conventional methods for weather observation [41]. Despite all the manufacturer statements, the only model that would meet this requirement, in a "plug-and-play" condition, was the HTU21D. Nevertheless, all the other sensor units would meet the mentioned requirement without greater efforts when using some form of data adjustment.
The computed data about the precision of the temperature sensors are presented in Table 4. The highest standard deviation was observed at the 40 °C level, where all the sensors exceeded the 0.1 °C mark. This occurrence may be explained by the thermal noise, intrinsically related to the  Table 3. Summary of averaged accuracy indicators for temperature sensors (all values in • C).

Sensor Model
Mean Error at σ ME Root Mean Squared Error at The farther away from zero these values are, the lower the accuracy of the respective sensor is. These numbers demonstrate that HTU21D presented the lowest overall mean error, whilst the BME280 presented the highest. However, variations in the mean error for different temperature levels may indicate a heterogeneous accuracy along the evaluated temperature range. As the exposed values were averaged from individual sensor unit readings from all the executions, the consistency of the error is represented by the standard deviation of the observed mean errors (σ ME ), for which lower values represent more homogeneous mean errors from different sensor units in different experimental executions (e.g., a constant error from all units of a sensor model would result in σ ME = 0-in other words, an error purely systematic). On this, it can be noted that the most consistent sensor model was the MPL3115A2, followed by HTU21D. The least consistent sensor models were the DS18B20 and the AM2302.
Expanding the considerations to professional requirements, the World Meteorological Organization (WMO) points out an ideal accuracy of ±0.5 • C in its guidelines for conventional methods for weather observation [41]. Despite all the manufacturer statements, the only model that would meet this requirement, in a "plug-and-play" condition, was the HTU21D. Nevertheless, all the other sensor units would meet the mentioned requirement without greater efforts when using some form of data adjustment.
The computed data about the precision of the temperature sensors are presented in Table 4. The highest standard deviation was observed at the 40 • C level, where all the sensors exceeded the 0.1 • C mark. This occurrence may be explained by the thermal noise, intrinsically related to the semiconductors. The WMO, regarding precision, recommends a value of 0.2 • C for temperature measurements. It can be concluded that although the precision of the investigated sensors decreases with increasing temperature, they can meet the WMO's requirement at moderate temperatures. An interesting fact that can be observed in these data is that the sensors do not significantly differ in the values of the standard deviations, as they showed very similar values along the investigated range. This suggests that all the sensor models were affected by noise with similar intensity. The information presented so far considered the static behaviour of the sensors. However, the performance of the sensors during the temperature variations is also of interest. Scatter plots generated with datapoints during temperature changes are presented in Figure 6, with aggregated data separated by sensor model.
IoT 2019, 2 FOR PEER REVIEW 9 semiconductors. The WMO, regarding precision, recommends a value of 0.2 °C for temperature measurements. It can be concluded that although the precision of the investigated sensors decreases with increasing temperature, they can meet the WMO's requirement at moderate temperatures. An interesting fact that can be observed in these data is that the sensors do not significantly differ in the values of the standard deviations, as they showed very similar values along the investigated range. This suggests that all the sensor models were affected by noise with similar intensity. The information presented so far considered the static behaviour of the sensors. However, the performance of the sensors during the temperature variations is also of interest. Scatter plots generated with datapoints during temperature changes are presented in Figure 6, with aggregated data separated by sensor model. All sensors presented the expected linearity with no anomalies. However, the obtained relationships, which were also used to obtain the dynamic residuals (DR) of all the sensor units (see Table 5; the r 2 values were suppressed, since they all exceeded the value of 0.99), demonstrate that some sensor models had a subtle difference-although not critical-in slope and in the intercept point, if compared to the ideal behaviour (y = x).
It is important to emphasize that this metric, if analysed alone, has no descriptive value, as some situations may lead to low DR values, such as-for example-a combination of positive bias and response delay. The DR values were considered as very satisfactory for the AM2302 and BMP180 sensor units. The DR value for HTU21D can also be considered satisfactory, but marginal. The DR value from the other sensor models exceeded 1.0 °C. In terms of sensitivity, it is related to the slope All sensors presented the expected linearity with no anomalies. However, the obtained relationships, which were also used to obtain the dynamic residuals (DR) of all the sensor units (see Table 5; the r 2 values were suppressed, since they all exceeded the value of 0.99), demonstrate that some sensor models had a subtle difference-although not critical-in slope and in the intercept point, if compared to the ideal behaviour (y = x). Table 5. Averaged linear models for each sensor model and its respective dynamic residuals (DR) indicator.

Sensor
Averaged Linear Model DR ( • C) It is important to emphasize that this metric, if analysed alone, has no descriptive value, as some situations may lead to low DR values, such as-for example-a combination of positive bias and response delay. The DR values were considered as very satisfactory for the AM2302 and BMP180 sensor units. The DR value for HTU21D can also be considered satisfactory, but marginal. The DR value from the other sensor models exceeded 1.0 • C. In terms of sensitivity, it is related to the slope of the averaged linear model. The obtained slopes were near to 1.0 for all the sensor models (minimum of 0.970, for AM2302; maximum of 0.999, for BMP180). In this respect, it can be asserted that these sensors do not present issues regarding their sensitivity.

Relative Humidity Sensors
The analysis of the humidity sensors required additional attention since their data are more segmented, as diverse combinations of temperature and humidity levels were programmed. Figure 7 illustrates the readings from the humidity sensors inside the controlled chamber during a run of the second experimental profile.
IoT 2019, 2 FOR PEER REVIEW 10 (minimum of 0.970, for AM2302; maximum of 0.999, for BMP180). In this respect, it can be asserted that these sensors do not present issues regarding their sensitivity.

Relative Humidity Sensors
The analysis of the humidity sensors required additional attention since their data are more segmented, as diverse combinations of temperature and humidity levels were programmed. Figure  7 illustrates the readings from the humidity sensors inside the controlled chamber during a run of the second experimental profile. The expected humidity levels were 30%, 50% and 80%. However, it can be noted-from Figure  7-that the reference sensor provided different values at certain moments (e.g., 37% at Hour 28). This reveals that humidity control with different temperature levels is a complex task, even for certified devices.
The accuracy indicators were obtained for each combination of humidity and temperature. The results for the ME and for error consistency (σ ) are presented in Table 6, and the obtained results for the RMSE, with individualized information, are presented as a colourmap in Figure 8.  The expected humidity levels were 30%, 50% and 80%. However, it can be noted-from Figure 7-that the reference sensor provided different values at certain moments (e.g., 37% at Hour 28). This reveals that humidity control with different temperature levels is a complex task, even for certified devices.
The accuracy indicators were obtained for each combination of humidity and temperature. The results for the ME and for error consistency (σ ME ) are presented in Table 6, and the obtained results for the RMSE, with individualized information, are presented as a colourmap in Figure 8. values of relative humidity [41]-the evaluated sensor models reached different levels: the AM2302 met the requirements at all levels, except at 80% at 10 °C, where it exceeded the value of 1%; the HTU21D did not meet the accuracy requirement at any point analysed; the BME280 met the requirement at 30% and 50% levels but, at 80%, only met it at the temperature level of 10 °C.
Regarding the error scatter, the lowest value of σME was observed for the HTU21D sensor (1.8%), and the highest value for the AM2302 (2.7%). These values were not considered critical but should not be ignored either, since they are indicative of random error (used for precision metrics). The precision metrics of the humidity sensor are presented in Table 7, which contains the standard deviations averaged from all sensor units' outputs.  Regarding the mean error, the higher overall agreement (lower ME) between each sensor model and the reference was observed for the humidity level of 80%. Considering the WMO accuracy recommendations-1% at higher values of relative humidity (80% or more) and 5% at moderate values of relative humidity [41]-the evaluated sensor models reached different levels: the AM2302 met the requirements at all levels, except at 80% at 10 • C, where it exceeded the value of 1%; the HTU21D did not meet the accuracy requirement at any point analysed; the BME280 met the requirement at 30% and 50% levels but, at 80%, only met it at the temperature level of 10 • C. Regarding the error scatter, the lowest value of σ ME was observed for the HTU21D sensor (1.8%), and the highest value for the IoT 2020, 1 297 AM2302 (2.7%). These values were not considered critical but should not be ignored either, since they are indicative of random error (used for precision metrics).
The precision metrics of the humidity sensor are presented in Table 7, which contains the standard deviations averaged from all sensor units' outputs. The overall precision exhibited very satisfactory results, even if compared with the WMO's requirements (5% at mid-range humidity).
The trueness of the humidity sensors during relative humidity variations was also assessed. Figure 9 contains the scatter plots, with all the datasets from reference and the sensor readings (in grey, all individual readings; in colour, the averages). The respective averaged linear models are presented in Table 8, which also presents the calculated values for the DR.
IoT 2019, 2 FOR PEER REVIEW 12 The overall precision exhibited very satisfactory results, even if compared with the WMO's requirements (5% at mid-range humidity).
The trueness of the humidity sensors during relative humidity variations was also assessed. Figure 9 contains the scatter plots, with all the datasets from reference and the sensor readings (in grey, all individual readings; in colour, the averages). The respective averaged linear models are presented in Table 8, which also presents the calculated values for the DR.  The grey points scattered around the averaged dataset, in Figure 9, indicate that these sensors were more susceptible to noise during the quantity variations. Regarding the sensitivity, all the sensors presented slope values above 1.0. However, a more significant case is observed for the AM2302 sensor, whose slope exceeded the ideal value by 12%, suggesting that this sensor is more affected by errors than the other models since it was overestimating the relative humidity variations.
The DR values agreed with the mean error values observed in static conditions (see Table 6), pointing out-thus-that there was no difference, in terms of overall error magnitude, if these sensors were reading static or dynamic values of relative humidity.
Despite all the observed issues regarding the accuracy of these sensors, either in dynamic or static conditions, there was no evidence that a linear calibration would not produce satisfactory outcomes in a data quality enhancement stage.  The grey points scattered around the averaged dataset, in Figure 9, indicate that these sensors were more susceptible to noise during the quantity variations. Regarding the sensitivity, all the sensors presented slope values above 1.0. However, a more significant case is observed for the AM2302 sensor, whose slope exceeded the ideal value by 12%, suggesting that this sensor is more affected by errors than the other models since it was overestimating the relative humidity variations. The DR values agreed with the mean error values observed in static conditions (see Table 6), pointing out-thus-that there was no difference, in terms of overall error magnitude, if these sensors were reading static or dynamic values of relative humidity.

Atmospheric Pressure Sensors
Despite all the observed issues regarding the accuracy of these sensors, either in dynamic or static conditions, there was no evidence that a linear calibration would not produce satisfactory outcomes in a data quality enhancement stage.

Atmospheric Pressure Sensors
The evaluation of the atmospheric pressure sensors relied on an inter-comparison between the sensor units. Figure 10 contains the atmospheric pressure sensor readings during one experimental execution.
IoT 2019, 2 FOR PEER REVIEW 13 C for the MPL3115A2 sensor. However, the RMSE complements this analysis, since its value denotes how good the presented regressions are to describe the pairs of datasets. In this case, both the lowest and highest RMSE values occurred for the BME280 sensor: the lowest value between Units A and C, and the highest value between Units A and B. The observed values point out that the reproducibility of the pressure sensors may be considered moderate to satisfactory for applications where the uncertainty is not critical.

Carbon Dioxide Sensors
Three experimental executions were performed, two of which took place in an office room (the first and third ones), interspersed with one in a bedroom. The different environments create different expectancies regarding the observed concentration of the gas in natural (non-controlled) conditions.
Whereas an additional step was necessary to extract information from the raw readings of the metal oxide sensors, this section includes two subtopics: one that addresses the issues observed about the calibration of the MOS sensor (Section 3.4.1), and another that presents the results (Section 3.4.2). To assess the reproducibility of the barometric sensors, the readings from all the experimental executions were grouped into a single dataset. The comparison was then made between different units of the same sensor model that were used at the same time and under the same experimental conditions (two units, out of three, at a time). The outcomes for these sensors are presented in Figure 11, which illustrates the obtained scatter plot for individual sensor models, and in Table 9, which presents the numerical results of the inter-comparison (the r 2 values exceeded 0.99).
IoT 2019, 2 FOR PEER REVIEW 13 C for the MPL3115A2 sensor. However, the RMSE complements this analysis, since its value denotes how good the presented regressions are to describe the pairs of datasets. In this case, both the lowest and highest RMSE values occurred for the BME280 sensor: the lowest value between Units A and C, and the highest value between Units A and B. The observed values point out that the reproducibility of the pressure sensors may be considered moderate to satisfactory for applications where the uncertainty is not critical.

Carbon Dioxide Sensors
Three experimental executions were performed, two of which took place in an office room (the In terms of inter-correlation, the linear regression closest to the ideal (1:1) was observed between Units A and B for the BMP180 sensor, the A and B sensor units for the BME280, and between A and C for the MPL3115A2 sensor. However, the RMSE complements this analysis, since its value denotes how good the presented regressions are to describe the pairs of datasets. In this case, both the lowest and highest RMSE values occurred for the BME280 sensor: the lowest value between Units A and C, IoT 2020, 1 299 and the highest value between Units A and B. The observed values point out that the reproducibility of the pressure sensors may be considered moderate to satisfactory for applications where the uncertainty is not critical.

Carbon Dioxide Sensors
Three experimental executions were performed, two of which took place in an office room (the first and third ones), interspersed with one in a bedroom. The different environments create different expectancies regarding the observed concentration of the gas in natural (non-controlled) conditions.
Whereas an additional step was necessary to extract information from the raw readings of the metal oxide sensors, this section includes two subtopics: one that addresses the issues observed about the calibration of the MOS sensor (Section 3.4.1), and another that presents the results (Section 3.4.2).

Metal-Oxide Sensor Calibration
A detail about the MOS sensor (MG-811) is that it did not provide its readings in values of gas concentration (ppm) but rather in raw voltage values that needed to be converted into ppm. The sensor manufacturer made available an approximated conversion equation, which we observed to be significantly inaccurate. This led to the necessity of a post-conversion of individual sensor data using the reference readings. Figure 12 depicts the nominal conversion curve and the observed calibration curves using data from the first execution in the office room. Additionally, Unit A of the MOS sensor presented a very compressed output, if compared to Unit B: 15 mV of signal excursion, against 40 mV. This, associated with the different baseline values, resulted in very different calibration curves, where the coefficients differed even by an order of magnitude. It was also found that the physical orientation of the sensor affects both its baseline and its sensitivity (the standard orientation used was with the sensitive layer horizontally positioned). Consequently, sudden movements may also affect its readings. In advance, it can be concluded that this sensor model is not feasible for air-quality mobile sensing platforms such as, for example, those boarded on public buses, bicycles, taxis, etc.

Metal-Oxide Sensor Calibration
A detail about the MOS sensor (MG-811) is that it did not provide its readings in values of gas concentration (ppm) but rather in raw voltage values that needed to be converted into ppm. The sensor manufacturer made available an approximated conversion equation, which we observed to be significantly inaccurate. This led to the necessity of a post-conversion of individual sensor data using the reference readings. Figure 12 depicts the nominal conversion curve and the observed calibration curves using data from the first execution in the office room. Additionally, Unit A of the MOS sensor presented a very compressed output, if compared to Unit B: 15 mV of signal excursion, against 40 mV. This, associated with the different baseline values, resulted in very different calibration curves, where the coefficients differed even by an order of magnitude. It was also found that the physical orientation of the sensor affects both its baseline and its sensitivity (the standard orientation used was with the sensitive layer horizontally positioned). Consequently, sudden movements may also affect its readings. In advance, it can be concluded that this sensor model is not feasible for air-quality mobile sensing platforms such as, for example, those boarded on public buses, bicycles, taxis, etc. Moreover, the calibration of the MOS sensor was found to be very unstable, as each individual sensor required a new conversion process each time it was powered on. This information can be observed in Table 10, which contains the conversion equations obtained from each experiment (the first and third occurred in the office room; the second, in the bedroom). Moreover, the calibration of the MOS sensor was found to be very unstable, as each individual sensor required a new conversion process each time it was powered on. This information can be observed in Table 10, which contains the conversion equations obtained from each experiment (the first and third occurred in the office room; the second, in the bedroom).  Figure 13 illustrates the timeseries from readings taken in the office room (first and third executions, chronologically) with mixed occupancy and a ventilation system during the daytime. Figure 14 illustrates the measurements obtained in the bedroom, where it is possible to notice two regions with higher concentrations: the first, from Hour 6 to 18, which corresponded to the occupancy of two persons during night and sleep time; the second region, after Hour 32, corresponded to mixed occupancy with only one person staying during sleep time.
IoT 2019, 2 FOR PEER REVIEW 15 It could be noticed in the data that both units of the MOS sensor had different convergence times (even after the nominal warm-up time), which, in turn, we defined as the time required for the sensor to enter at the first time-and stay-at a maximum distance of 100 ppm from the reference. An example can be seen in Figure 13a, where Unit A of the MOS sensor (in blue) took about 20 h to have its adjusted readings enter the stipulated margin of convergence, whilst Unit B (in green) reached the convergence much earlier: after 4 h. The difference in performance between the metal oxide sensor units might indicate a very poor reproducibility in terms of sensor manufacturing. In terms of calibration, the assessed NDIR units had a straightforward one-button self-calibration process, where the user presses the button while the sensor is exposed to fresh air for a few minutes (where the average CO2 levels are, approximately, 400 ppm). The auto-calibration procedure for these units was performed once before the experiments, in clean air, to verify its robustness (or persistence) over time of use. The accuracy indicators of these sensors are presented in Table 11.    It could be noticed in the data that both units of the MOS sensor had different convergence times (even after the nominal warm-up time), which, in turn, we defined as the time required for the sensor to enter at the first time-and stay-at a maximum distance of 100 ppm from the reference. An example can be seen in Figure 13a, where Unit A of the MOS sensor (in blue) took about 20 h to have its adjusted readings enter the stipulated margin of convergence, whilst Unit B (in green) reached the convergence much earlier: after 4 h. The difference in performance between the metal oxide sensor units might indicate a very poor reproducibility in terms of sensor manufacturing. In terms of calibration, the assessed NDIR units had a straightforward one-button self-calibration process, where the user presses the button while the sensor is exposed to fresh air for a few minutes (where the average CO 2 levels are, approximately, 400 ppm). The auto-calibration procedure for these units was performed once before the experiments, in clean air, to verify its robustness (or persistence) over time of use. The accuracy indicators of these sensors are presented in Table 11.  The NDIR sensors presented the best overall accuracy, with very satisfactory results in the first experiment (just after the calibration), and we consider-in advance-this specific model as suitable for citizen science and IoT applications. However, they presented higher errors in subsequent experiments. A more significant bias was found in the last measurement, which occurred in the same conditions as the first measurement, but on different days. It suggests that this sensor unit (Unit B) may be affected by monotonic drift with time (one-directional increasing bias). The MOS sensors presented satisfactory results in the last measurement (the office room experiment repetition), but only with a limited time window due to its stabilization time, which, in turn, is heterogeneous, and it is hard to predict its length before the beginning of the sensor use.
The linearity of these sensors was also evaluated through scatter plots containing the datasets from the sensors and reference. The plots are presented in Figures 15 and 16, for the measurements inside the office room and in the bedroom, respectively. The data containing the linear relationship between the sensors and the reference are presented in Table 12.

Sensor readings (×100 ppm)
Reference readings (×100 ppm) The NDIR sensors presented good linearity, with r 2 values above 0.97 for all measurements. In terms of sensitivity, they showed a moderate behaviour, with an average slope of 1.11 (minimum of 1.03 and maximum of 1.17). The subplot (b) in Figure 15 clearly depicts a monotonic drift of Unit B of the NDIR sensor, considering that the slope of this device was kept almost the same during the measurements and only the y-intercept point of the linear regression was increased. This phenomenon was also present but less evident in Unit A of the NDIR sensor model MH-Z16.  Summarizing the MOS sensor outcomes, Unit A presented inconsistent stabilization times, making it difficult to establish a linear relationship of its converted output with the reference (its scatter plots was formed with part of its output, only). Considering the stabilized readings, they demonstrated a satisfactory linearity, though. However, the outputs illustrated in Figures 15 and 16, associated with the linear relationships presented in Table 12, led us to assert that this sensor unit also presented issues regarding its sensitivity.

Discussion
The evaluated low-cost climatic sensors, subjected to different scenarios inside a controlled climatic chamber, presented a good cost-benefit ratio, considering that most sensors are very cheap and yet present satisfactory performance indicators. However, in terms of accuracy, there was evidence suggesting that these sensors are not calibrated in a full-range manner from the factory, and this condition may mean they do not meet professional accuracy requirements for environmental monitoring, if used in decentralized volunteer-based campaigns, for instance. On the other hand, polluted readings or bad-quality data are not unique to low-cost sensing, as even professional monitoring may provide erroneous or biased readings due to several factors, including the sensors themselves, as discussed by the authors in [26] and in [42].
Regarding the temperature sensors, the best overall performance, in terms of both accuracy and precision, was observed for the HTU21D, and the closest approximation to the reference during dynamic conditions was observed for the BMP180, which presented the lowest value for the DR metric (0.25 °C). The worst case for accuracy was observed for the BME280 and MPL3115A2 sensors, with the error always above 1.0 °C, exceeding their nominal parameters (as stated in the datasheets). In dynamic conditions, the worst performance was achieved by the BME280, with a DR value of 1.9 (perhaps due to its bias), almost double that of the second-worst sensor-the DS18B20, with 1.0. Regarding the precision of the temperature sensors, all the sensors were found to be very precise. In a more general conclusion, we observed that temperature sensors are more likely to present systemic errors (bias) than random errors (noise). Additionally, as the sensors presented good linearity and The NDIR sensors presented the best overall accuracy, with very satisfactory results in the first experiment (just after the calibration), and we consider-in advance-this specific model as suitable for citizen science and IoT applications. However, they presented higher errors in subsequent experiments. A more significant bias was found in the last measurement, which occurred in the same conditions as the first measurement, but on different days. It suggests that this sensor unit (Unit B) may be affected by monotonic drift with time (one-directional increasing bias). The MOS sensors presented satisfactory results in the last measurement (the office room experiment repetition), but only with a limited time window due to its stabilization time, which, in turn, is heterogeneous, and it is hard to predict its length before the beginning of the sensor use.
The linearity of these sensors was also evaluated through scatter plots containing the datasets from the sensors and reference. The plots are presented in Figures 15 and 16, for the measurements inside the office room and in the bedroom, respectively. The data containing the linear relationship between the sensors and the reference are presented in Table 12. The NDIR sensors presented good linearity, with r 2 values above 0.97 for all measurements. In terms of sensitivity, they showed a moderate behaviour, with an average slope of 1.11 (minimum of 1.03 and maximum of 1.17). The subplot (b) in Figure 15 clearly depicts a monotonic drift of Unit B of the NDIR sensor, considering that the slope of this device was kept almost the same during the measurements and only the y-intercept point of the linear regression was increased. This phenomenon was also present but less evident in Unit A of the NDIR sensor model MH-Z16.
Summarizing the MOS sensor outcomes, Unit A presented inconsistent stabilization times, making it difficult to establish a linear relationship of its converted output with the reference (its scatter plots was formed with part of its output, only). Considering the stabilized readings, they demonstrated a satisfactory linearity, though. However, the outputs illustrated in Figures 15 and 16, associated with the linear relationships presented in Table 12, led us to assert that this sensor unit also presented issues regarding its sensitivity.

Discussion
The evaluated low-cost climatic sensors, subjected to different scenarios inside a controlled climatic chamber, presented a good cost-benefit ratio, considering that most sensors are very cheap and yet present satisfactory performance indicators. However, in terms of accuracy, there was evidence suggesting that these sensors are not calibrated in a full-range manner from the factory, and this condition may mean they do not meet professional accuracy requirements for environmental monitoring, if used in decentralized volunteer-based campaigns, for instance. On the other hand, polluted readings or bad-quality data are not unique to low-cost sensing, as even professional monitoring may provide erroneous or biased readings due to several factors, including the sensors themselves, as discussed by the authors in [26] and in [42].
Regarding the temperature sensors, the best overall performance, in terms of both accuracy and precision, was observed for the HTU21D, and the closest approximation to the reference during dynamic conditions was observed for the BMP180, which presented the lowest value for the DR metric (0.25 • C). The worst case for accuracy was observed for the BME280 and MPL3115A2 sensors, with the error always above 1.0 • C, exceeding their nominal parameters (as stated in the datasheets). In dynamic conditions, the worst performance was achieved by the BME280, with a DR value of 1.9 (perhaps due to its bias), almost double that of the second-worst sensor-the DS18B20, with 1.0. Regarding the precision of the temperature sensors, all the sensors were found to be very precise. In a more general conclusion, we observed that temperature sensors are more likely to present systemic errors (bias) than random errors (noise). Additionally, as the sensors presented good linearity and sensitivity (see the slopes of the linear relationships presented in Table 5), they could detect, very satisfactorily, variations in temperature.
The humidity sensors, in general, demonstrated moderate accuracy and precision. A good linearity was observed, as well as a very strong correlation (trueness) between the sensors and the reference instrument. The slopes of the linear models obtained from the data indicate that these sensors can follow daily humidity variations of the same magnitude considering the experimental boundaries of 30% and 80% air relative humidity. The best overall performance amongst the evaluated sensors was achieved by the BME280 humidity module. These observations can be complemented by the results reported by the author in [43], who found that low-cost humidity sensors using capacitive transduction may present satisfactory results even in extreme conditions of very low temperatures and low vacuum. However, these sensors can also present significantly biased data, as observed for the HTU21D sensor model. Systematic errors can be easily corrected, but the challenge is to identify when the sensor is biased or not. When the sensors are managed by lay individuals, this can be a hindering factor for obtaining high-quality data in the context of citizen science. As a suggestion, when a reference is unavailable for co-location, one can compare the readings from a sensor with the official weather information of the local provider over time and check if the low-cost sensor is following the daily pattern according to the official data. The presence of the random error (noise) was more pronounced in the humidity sensors, yet it did not prove to be critical in relative terms.
The verdict about atmospheric pressure sensors lies in their reproducibility. The obtained results from the inter-comparison of the sensor units allows us to conclude that the evaluated sensors present very good reproducibility. Another finding was an eventual cross-interference from temperature and humidity in the sensor readings, since the deviation observed between the BME280 and BMP180 readings seemed to follow a pattern. In practice, such an issue can be overcome by using machine-learning algorithms using multivariate data to model the error, as demonstrated in [35].
The positive findings about the utilization of the low-cost climatic sensors can be strengthened if the application presented by Mwangi [44] is considered: the author built weather stations with low-cost sensors that were later tested and assessed at NOAA (National Oceanic and Atmospheric Administration of United States of America) facilities, and then managed to deploy them in hard-to-reach sites in Kenya. Moreover, these weather stations used two sensor models that were investigated in this paper: the HTU21D and BMP180 models. Another example of work with interesting achievements using low-cost climatic sensors is the "CanSat" design, launch and data collection experience reported by Colin and Jimenez-Lizárraga [12], which used-with no issues mentioned-the MPL115A2 sensor, a previous version of the MPL3115A2, used in this paper.
In relation to the carbon dioxide sensor outcomes, they led to different conclusions in our assessment in the indoor environment under natural conditions (an expected outcome, since they use different transduction principles). While the NDIR sensor (MH-Z16) showed acceptable accuracy and good linearity, the MOS sensor (MG-811) presented, generally, poorer performance due to its unpredictable stabilization time, inaccurate nominal conversion curve, hysteresis, and higher bias and even the significant interference from physical movements and orientation (both baseline and sensitivity were affected by the physical orientation of the sensor). None of these issues are mentioned in the sensor datasheet.
The analysed data show that MH-Z16 is a ready-to-use sensor with built-in calibration suitable for indoor monitoring. In brief terms, a USD 50 device showed very similar performance to a USD 2000 certified instrument (Vaisala), proving to be a cost-effective device. The MG-811, in turn, considering the options that a citizen scientist would have, for instance, has its application list reduced to basically a carbon dioxide detector, since even movements and physical orientation affected its readings. This sensor may be used in fixed spots, with a stable power supply and with no sudden temperature variations (e.g., exposure to cold wind streams), so it can serve as a complementary asset for indoor surveillance or human presence detectors [21,22,45]. Without a reference being co-located with this sensor, the MG-811 alone is unable to provide information. Accordingly, it can be said that NDIR sensors can achieve superior performance when compared to MOS sensors for carbon dioxide monitoring in the Internet of Things scope. However, the monotonic drift of NDIR sensors still necessitates a very long-term investigation to extract conclusive information on whether they can play an important role in the assessment of climate change, for example, especially in the monitoring of carbon dioxide levels in the atmosphere, which requires constant and reliable measurements over years.
It is convenient to emphasize that although many authors undertake relevant contributions in the field of low-cost environmental sensors, many of the observed contributions are mainly scenario analytics [46,47], field applications [5,17,48,49] and advanced techniques for improving sensor data accuracy [33,34,50]. In view of these observations, this work addressed a little-explored issue when analysing the behaviour and performance of low-cost environmental sensors from the perspective of an individual engaged in using the Internet of Things to build their own platform, justified by the fact that these individuals may not have access to data calibration tools, such as reference sensors or machine-learning algorithms, and yet they remain interested in building and keeping their weather stations running and providing data, either due to particular motivations or by engaging in collaborative sensing campaigns.

Conclusions
This paper presents an original experimental study on the performance metrics of sensors commonly used in instrumentation within the scope of environmental monitoring using the Internet of Things, providing a statistical analysis of the raw data obtained from the sensors exposed to scenarios designed to obtain as much performance information as possible (the datasets are given in the Supplementary Materials section). The evaluation included different low-cost environmental sensors for temperature, humidity, atmospheric pressure and carbon dioxide that can be easily obtained at affordable prices from recognized online stores, making the experimental setup easily reproducible for anyone who is interested.
As findings, it turns out that while some of these sensors are ready-to-use devices and can present reliable readings even without previous assessment, in terms of being able to provide a good sense of reality at a very low price, other sensors presented issues regarding accuracy and sensitivity. We also found that the information contained in the sensor documentation sometimes might be partially inaccurate. As an example, we could mention that regarding the temperature accuracy in the datasheets of the BME280 and the MPL3115A2. The nominal levels of accuracy were claimed to be ±0.5 and ±1.0 • C, while the average RMSEs observed were 1.7 and 1.2 • C (from −5 to 40 • C), respectively. Considering the MG-811 MOS sensor, the lack of details in its datasheet was found to be much more critical. This sensor would be completely useless without a reference to calibrate it, since its nominal conversion curve was proven to be incorrect, as well as its susceptibilities to interferences not being stated clearly in the document.
Moreover, to fully meet professional requirements, such as those proposed by the WMO, the individual calibration of these sensors is desired. This suggests that, without efforts in individual sensor calibration, citizen science and Internet of Things approaches for environmental monitoring cannot be compared to conventional methods yet, due to producing biased readings (in temperature monitoring, for instance).
If it was necessary to choose the best sensor for each quantity in "from factory" conditions, from the evaluated ones, we would recommend the HTU21D for temperature, the BME280 for humidity and atmospheric pressure, and the MH-Z16 for carbon dioxide. However, as mentioned, if the operation requires a high-end specification beyond the Internet of Things scope, actions must be taken to obtain higher-quality data information, such as machine-learning methods for real-time calibration.
Aware of the limitations of this work, we can mention, as future work, further investigations to cover the aging of these sensors through the persistence of performance indicators in their long-term use and the robustness of a calibration curve (for the case of metal oxide sensors). In addition, it is also desirable to evaluate the behaviour of climate sensors in an external environment, when they are exposed to natural conditions where the measurements from them can change unexpectedly.