1. Introduction
Agriculture plays a crucial role in modern society, forming the backbone of the global food industry and being a key factor for the global economy [
1], with the rapid growth of population leading to an over-reliance on conventional farming practices in order to meet demand [
2]. However, the implementation of these farming practices, such as conventional tillage, often poses significant challenges to the environment, such as climate change, soil degradation, and CO
2 emissions [
3,
4,
5]. The urgent need to shift to more sustainable and still-productive systems of farming instigated the development of precision agriculture (PA). The International Society of Precision Agriculture (ISPA) defines PA as a management strategy that involves gathering, processing, and analysing data over time and across locations for individual plants and animals [
6].
Data acquisition is one of the most essential components of PA; in particular, capturing meteorological data is crucial for agricultural practices [
7,
8]. Among these data, air temperature stands out as a vital parameter, influencing not only crop yields but also livestock productivity and the overall stability of agricultural ecosystems. It is often leveraged in crop yield production models, serving an essential role [
9,
10,
11]. In crop production, a fluctuation of ±2 °C from the mean seasonal temperatures during the wheat growing season could lead to significant grain yield losses, potentially reaching up to 50% [
12,
13]. From the perspective of livestock production, heat stress resulting from temperature fluctuations has been shown to reduce milk yield in dairy cows by approximately −0.17 kg/°C to −0.38 kg/°C [
14,
15]. This underscores the importance of precise air temperature monitoring, particularly from a microclimatic perspective.
There are two main types of temperature sensors: contact and non-contact sensors. Non-contact sensors, such as infrared devices, utilise radiated energy, while contact sensors include thermistors, thermocouples, resistance temperature detectors (RTDs), and semiconductor sensors [
16]. Thermistors are resistors made from ceramic materials, such as oxides of nickel, manganese, or cobalt, and exhibit significant changes in resistance with temperature [
17,
18]. Thermocouples consist of two metals joined together at two junctions: one measures the temperature (hot), while the other connects to a reference body (cold). RTDs, such as platinum resistance thermometers (PRTs), use pure metals with resistance proportional to temperature changes. Semiconductor sensors made from silicon measure temperature by detecting the voltage drop across a diode [
16,
19,
20].
The systems currently available for temperature monitoring, including fixed meteorological stations, mobile meteorological stations, and satellite-based systems, face limitations. Fixed meteorological stations, as their name suggests, are situated at a single point in a specific location. These stations are equipped with advanced thermometers that provide accurate air temperature readings; however, they are limited in terms of spatial coverage and entail high deployment costs [
21,
22]. Mobile meteorological stations provide an advantage over fixed stations by offering data from multiple spatial points within a region, thus enhancing coverage. However, they have the limitation of not delivering data over a continuous time series and necessitate relocation each time [
22]. Another method for monitoring temperature is the use of remote sensing tools, such as satellites. These systems utilize infrared radiation to monitor temperature. In the absence of dense networks of fixed and mobile meteorological stations, these satellites can provide spatial-temporal distribution, which can be used as a parameter to complement the meteorological stations. However, their feasibility is limited in indoor settings and under cloud cover [
23].
The integration of wireless sensor networks (WSNs) that utilize low-cost air temperature sensors into precision agriculture offers a promising solution to complement current monitoring systems. WSNs are the major drivers in precision agriculture; they are composed of compact micro-sensors with wireless communication capabilities, consisting of hundreds or even thousands of these micro-sensors [
7,
8]. The term “low-cost sensors” has been interpreted in many ways by different researchers [
24]. These sensors are typically small and low-cost; thus, they can be deployed in harsh environments like an agricultural field without risk of significant losses due to sensor damage. These sensors have enabled the feasibility of monitoring micro-climates since they can be deployed in large numbers. Most of these sensors fall in the category of thermistors and semiconductors. They are typically ten or more times less expensive than conventional meteorological sensors, which are generally expensive due to their complexity and require a lot of maintenance [
25]. These sensors, however, face limitations in terms of data reliability, as numerous studies have pointed out [
26]. Calibration has been recognised as a crucial step in addressing concerns of reliability.
These sensors have been shown to provide accurate data with scientifically correct calibration methods. Calibration entails comparing the readings of the sensor to a reference instrument in either a controlled setting, such as a laboratory, or in a co-location setting either indoors or outdoors to assess sensor performance in its intended place of application [
24,
27]. Calibration of low-cost sensors can be viewed primarily in two ways: by comparison with high-end reference instruments or using low-cost, less expensive calibration methods. The choice of calibration method is subjective and depend on whether the individual tends to be a maximiser or satisfier. A maximiser seeks the best possible outcomes, often leveraging the use of state-of-the-art instruments for sensor calibration. They are usually resource-intensive and can sometimes be unfeasible without proper access. In contrast, a satisfier’s objective is to achieve reliable calibration results at a lower cost, prioritising cost-efficiency and simplicity in their calibration procedures [
28].
Despite the numerous studies on the calibration of low-cost air temperature sensors, a limited number of studies have conducted calibration in field settings. Additionally, several prior reviews have covered the calibration of low-cost sensors in a broader environmental area. Karagulian et al. (2019) reviewed the performance of low-cost air quality sensors, highlighting the need for more unified calibration approaches, though their study primarily focused on pollution sensors, with minimal reference to temperature measurement [
29]. Chojer et al. (2020) similarly emphasized the general lack of standardized reporting across low-cost sensor calibration studies, additionally noting that most studies lack calibration results, limiting their comparability [
26]. However, there is a lack of comprehensive reviews to date primarily focusing on the calibration of low-cost air temperature sensors, underscoring the need for a focused and rigorous study. To address these challenges, this study conducted a systematic review of the present methods and models of calibration, the types of low-cost temperature sensors used, and their advantages and limitations. This review followed the 2020 PRISMA guidelines for critical analysis of the existing literature.
The objectives of this study are the following:
Identifying and analysing the calibration models and their calibration methods to understand the best model performance.
To understand which type of low-cost sensor is the most used in temperature measurement.
To analyse the performance of sensors under various influencing factors.
Identify the settings where these types of sensors are most utilised, whether in outdoor or indoor environments. While it was aimed to find studies calibrated at temperatures ranging from −10 °C to 40 °C, which are typical of agricultural conditions, we adjusted our focus to include more studies due to the limited number of papers in this range, particularly in the agricultural context.
Identify the challenges that current calibration models and methods pose to researchers in this field.
2. Methodology
This study conducted a systematic literature review following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines to ensure transparency and minimize bias in the studies collected [
30,
31] (
Figure 1).
This review utilized specific criteria for selecting studies for inclusion and exclusion. These criteria outline the types of scientific papers that were deemed eligible for inclusion in this review.
- ➢
Studies eligible for inclusion were published journal papers and conference papers; these studies were required to be full-text published papers. Restrictions were not placed on the status of peer-reviewed studies.
- ➢
Studies eligible for inclusion had to focus on sensor calibration, specifically testing low-cost air temperature sensors (LCSs).
- ➢
“Low-cost sensors” were those placed in the range of less than USD 50. This value was agreed upon due to variation in global prices due to differences in tax policies, material costs, and sensor availability, with costs ranging from USD 5 in some countries to USD 20 in others. The sensors used must be compatible with low-cost microcontrollers such as Arduino (
https://www.arduino.cc, accessed 15 October 2024) or Raspberry Pi (
https://www.raspberrypi.com, accessed 15 October 2024). No restrictions were placed for the type of model used; any board from these platforms were considered eligible.
- ➢
Despite no formal restriction on the publication dates, the focus on sensors compatible with microcontrollers such as Arduino and Raspberry Pi introduced a timeline boundary, since these microcontrollers were introduced in the period between 2005 and 2012.
- ➢
The selected studies’ methodology must detail the sensor calibration process. In the case of limited calibration-focused studies, additional studies that utilised calibration models for sensor validation were eligible for inclusion to reduce bias resulting from the number of studies.
- ➢
The studies were required to include details on the performance metrics utilised for sensor and calibration assessment.
- ➢
Studies that failed to specify the type of low-cost sensor utilised, lacked performance metrics, did not specifically mention the sue of low-cost air temperature sensors, or were not published as full-text studies were excluded.
- ➢
There was no restriction on the settings where the experiments were conducted.
- ➢
In terms of language used in the studies, no restrictions were placed, since most of the calibration data were in numerical figures and graphs.
The selection process lasted from August 2024 to November 2024. The studies were sourced from Google Scholar, Scopus, and Web of Science. To ensure literature saturation, we sourced studies from the reference lists of the eligible studies.
The search strategy was developed using the Population Exposure Outcome (PEO) framework (
Figure 1).
Using this framework, the keywords were categorised into three categories:
- ➢
Population: Described the types of sensors leading to the selection of keywords such as “Low-cost”, “Temperature sensors”, “Arduino”, “Raspberry Pi*”, and “Air temperature”.
- ➢
Exposure/Intervention: Implied interventions placed on the experimented sensor leading to the selection of keywords such as “Sensor calibration”, “Sensor correction”, and “Calibrate*”.
- ➢
Outcome: Entailed the expected outcomes of the interventions applied on the sensors in terms of performance metric. This led to selection of keywords such as “Calibration performance”, “Reliability of sensors”, “Sensor accuracy” “sensor error”, and “Calibration error”.
These keywords were used for the literature search on Scopus and Web of Science. Boolean operators such as “AND” and “OR” were applied on Web of Science and Scopus, and additional operators like “AND NOT” were used to exclude unrelated studies. This exclusion was necessary because the initial search returned a large number of studies on low-cost pollution sensors such as particulate matter (PM), CO2, and NO2 sensors, rather than low-cost air temperature sensors. This led to the formulation of search strings such as the following:
- ➢
“Calibration” AND “air temperature” AND “sensors” AND NOT “CO2 concentration” AND NOT “Pm2.5/Pm10”.
- ➢
“Calibration performance” “AND” “air temperature sensors”.
- ➢
“Accuracy” AND “Calibration” AND “Low-cost” AND “Temperature sensors” AND “Arduino” OR “Raspberry Pi*”.
- ➢
“Temperature calibrations”.
- ➢
“Low-cost” AND “Calibration” AND “air temperature sensors” AND “arduino” OR “Raspberry Pi”.
Additionally, key sentences without Boolean operators were utilised on Google Scholar and Web of Science; these are “Calibration of low-cost air temperature sensor”, “Performance of low-cost air temperature sensor”, “air temperature sensor calibration techniques”, “Reliability of low-cost air temperature sensors”, “Comparative analysis of low cost air temperature sensors”, and “Low cost air temperature sensors performance evaluation for air temperature measurement”. To enhance the process, the exclusion and inclusion filters available on Scopus and Web of Science were utilised. On Web of Science, keywords like “particulate matter”, “nitrogen dioxide”, “particulate matter pm”, “soil water content”, “pollution measurement”, and any other keywords related to pollution and gas sensing were excluded using the “EXCLUDE” option on the database. On Scopus, the option of “Limit to” OR “Exclude” was used, limiting the studies to articles, conference papers, and reviews only. The option was also used to limit and exclude keywords according to our inclusion and exclusion criteria.
The selection process yielded an initial 296 studies on Scopus and 275 studies on Web of Science; after applying the filters mentioned, the final studies for each database were 87 studies and 79 studies, respectively (see
Figure 2). These studies were assessed manually for title relevance, yielding 35 and 17 studies for Scopus and Web of Science, respectively. For each key sentence, the first 50 studies were assessed for title relevance on Google Scholar during each search round. The abstract of each study was evaluated, following the previous inclusion criteria, and the selected studies were added to a shared Google Sheet. This process resulted in a final number of 46 studies, including 22 low-cost sensors (see Tables 1 and 2). Of these 46 studies, nine were sourced from references in sections from the selected studies. Among the collected studies, five were published in Indonesian and were translated into English using the Microsoft Word feature for translating documents [
32,
33,
34,
35,
36]. Three independent reviewers conducted the screening process. While efforts were made to document all exclusions, some discrepancies arose in tracking the exact number of excluded studies. Thus, the data provided are an estimate within a reasonable confidence range. This limitation was addressed by cross-checking exclusion criteria and ensuring consistency in the selection process.
The selected studies were organized into the following categories:
- ➢
Sensor types, which entailed how the sensors perform to measure temperature, whether using a thermistor or a semiconductor.
- ➢
Calibration model or the validation analysis used to compare or calibrate the sensors.
- ➢
In the cases of direct approach, with no use of any statistical models, this was classified under direct comparison, along with studies that did not give any information on the models used.
- ➢
Environmental settings of the studies where calibration was carried out. This was categorised into indoors or controlled environments and uncontrolled environments or outdoors.
- ➢
Performance accuracies: This was presented in terms of errors such as RMSE (root mean square error), MSE (mean square error), mean error, MAE (mean absolute error), standard deviation, and uncertainty. The RMSE shows the average of the largest possible errors to be expected from the sensor, or the models as compared to the reference, with magnitude but without direction. MAE gives an average of the errors without direction. The mean error will show the average of the errors with direction; it gives a hint that the system may experience bias. Additional performance metrics were included from the studies on the calibration model performance; this included the R2 (coefficient of determination) values, which show how well the reference sensor and the calibrated sensor agree.
- ➢
Additional details like the reference sensor, duration of calibration, and calibration models used (linear regression, polynomial regression, and machine learning algorithms) were collected.
While no formal tool or software was used to assess for bias, a structured review process was established to minimise bias in the selected studies. These measures were as follows:
- ➢
The data collection process was conducted on a shared Excel file and a Google Sheet document to ensure transparency among reviewers.
- ➢
The reviewers would occasionally check for duplicated studies within the Excel file or Google Sheet and flag them for exclusion. Additionally, details on journal and authors were included in the studies. This enabled the identification of similar research published under different titles. The more detailed or original of the duplicates were retained for inclusion. These studies were labelled as high risk.
- ➢
For data transparency within the collected studies, reviewers would passively read the sections of the studies. Studies that lacked numerical results or presented unsupported claims in the discussion or conclusion sections—without corresponding data in the results—were excluded (also labelled as high risk). Preference was given to studies that clearly reported performance metrics (e.g., RMSE, MAE, or R2) and calibration models.
However, due to the limited number of studies, those with a moderate risk of bias were acceptable. This was in studies that presented calibration or validation methodologies but in shallow sense. Additionally, five studies were translated from Indonesian to English using Microsoft Word 365’s built in translator (Microsoft
® Word for Microsoft 365 MSO (Version 2505 Build 16.0.18827.20102) 64 bit)). While these studies may introduce bias due to translation risks, details on the calibration that were presented numerically were checked by one of the experienced reviewers, Dr. István Mihály Kulmány, who was experienced in sensor calibration [
4]. An additional translation test was conducted using Google Translate, where we selected random paragraphs from the texts and compared the translation from Google Translate with the Microsoft translation. The results showed minor phrasal differences, but the meaning of each paragraph was the same in both. These studies were accepted for inclusion since the data presented could be followed.
The studies collected were synthesized into groups for calibration models, sensor types, and performance across sensor types and across calibration models. Studies that were grouped into calibration models provided details on the type of sensor utilised and the performance of the sensor before calibration and after calibration. Additionally, due to the limited number of calibration-focused studies utilising similar models for sensor validation were included in the synthesis described in
Section 3.1. For sensor types and performance evaluation, studies that also applied direct comparison were included for comparison in sensor performance across varies temperatures.
The findings of the synthesis will be presented in
Section 3 and an elaboration of the results presented in
Section 4.
4. Discussion
This systematic review critically analysed 46 studies evaluating the calibration and validation of low-cost air temperature sensors across different models, experimental settings, temperature ranges, environments, and performance metrics. The findings from the studies show clear trends and limitations that directly affect the applicability of these sensors, particularly in precision agriculture and other field- or outdoor-based applications.
4.1. Calibration Model Trends
The findings of this review indicate that
linear regression was the most commonly applied calibration or validation model, used in 17 of the 46 studies, primarily due to simplicity and availability. Most of the studies used the model to validate and calibrate low-cost air temperature sensors, particularly under indoor conditions within the temperatures of 20 °C to 27 °C. Its calibration performance was demonstrated in various studies, where the performance of sensors improved after its application. The studies reported R
2 improvement from 0.92 to 0.99, while others showed R
2 improvement from 0.98 to 0.99. Reductions in error were also noted; for instance, the standard error was reported to decrease by 48% and the average error was reported to decrease from 1.07 °C to 0.05 °C after model application during calibration [
37,
38,
40].
The effectiveness of the model as a validation tool for sensor performance was also demonstrated in several studies, which reported high R
2 values (>0.90) when sensor outputs were compared against reference measurements [
44,
45,
48]. However, despite their effectiveness in short-term validation, these models exhibit limited flexibility in long-term deployments. Since the model requires manual input to generate correlation equations for calibration, it was limited in adjusting for sensor drift or environmental changes over time. As demonstrated in [
41], calibration equations derived at the beginning of a long deployment became ineffective after 20 months due to sensor drift and a new sensor-specific calibration equation had to be determined.
Machine learning algorithms, although only used in a few studies, demonstrated high potential for outdoor calibration or validation of low-cost air temperature sensors. These models were reliable in calibrating outdoor environmental sensors, primarily because they were able to tackle various variables that affect sensor accuracy [
39,
55]. These algorithms can be implemented on low-cost microcontrollers like the Arduino and Raspberry Pi [
39,
57]. Machine learning algorithms such as random forest (RF), support vector machines (SVMs), and artificial neural networks (ANNs) can be effectively executed on low-cost microcontrollers like the Raspberry Pi [
57,
73]. Studies in this review have demonstrated that machine learning algorithms, such as artificial neural networks (ANNs), backpropagation neural networks (BNNs), and long short-term memory recurrent neural networks (RNN-LSTMs), have high predictive accuracy in field conditions, particularly when trained on ground truth data [
39,
55,
56]. However, the general performance of machine learning algorithms depends on the specific application and target level of accuracy. Research shows that, for these algorithms to yield reliable results, the ground truth data must be highly accurate [
74]. This precision can result from the use of low-cost sensors in close proximity to high-end reference stations [
39], using advanced mobile reference equipment [
66], or high-quality environmental data from authoritative agencies.
The performance of these models is highly dependent on the reliability of input variables. In the 2017 Japan study [
39], the ANN model identified solar radiation as the most influential predictor of air temperature. During periods at 16.00 h, when partial cloud cover had an unforeseen disruption on solar radiation in the collection equipment, the model significantly under-predicted air temperature, resulting in large calibration errors. This is a basic limitation of ANN-based calibration models: vulnerability to environmental variability in key input parameters. These indicate the importance of high-quality ground truth data for the model to be able to perform well in calibration. This begs the question of “how effective are machine learning algorithms as a low-cost calibration technique?”. The effectiveness of machine learning algorithms as a low-cost calibration technique is relative, since “low-cost” can be relative to the user and application context. For example, Yamamoto et al. [
39] found that the use of an ANN to calibrate the SHT-71 sensor was much more cost-effective compared to using expensive, high-end radiation shields for the temperature sensor.
While machine learning and linear regression models were more commonly applied in the studies covered, only one study employed
polynomial regression for sensor calibration [
54]. A second-order polynomial was applied in the study for the validation of multiple sensor types. The study reported minimum error deviations at 15 °C, with diminishing performance at a higher temperature range of 25 °C to 35 °C. Overall, polynomial regression brought the prototype’s mean error up to 0.249 °C, with some potential for controlled environments. However, the broader applicability of the model in low-cost air temperature calibration remains unclear due to the limited studies identified in this review. Thus, studies should explore polynomial regression as a calibration alternative to linear regression models, particularly where nonlinear sensor response is observed; this could be in thermistor-based sensors, due to their performance variation at different temperature ranges [
32,
47,
68,
70] (
Table 4).
4.2. Factors Affecting Calibration and Sensor Performance
Calibration performance and sensor performance were noted to be influenced by environmental factors such as solar radiation, airflow, and ambient temperature range. Additionally, aspects of experimental design, such as calibration duration and distance between sensor and reference instrument, appeared to contribute to the variability of both sensor and calibration performance. The influence of these factors often determined whether calibration models and sensor performance yielded accurate readings with lower errors [
32,
36,
44,
59].
Indoor or controlled environments were noted to yield higher performance in sensor readings, whereas performance often degraded in field or outdoor settings. For instance, Ref. [
61] reported lower errors when the DHT22 was validated in controlled environments in the laboratory, reporting a maximum error of 0.85 °C. In a similar study [
48], the DHT22 was reported to present a maximum error of 0.66 °C in controlled laboratory conditions at temperatures between 20 °C and 40 °C. However, the error increased to 1.85 °C after the sensor was deployed in a field setting (uncontrolled conditions) at temperatures between 22 °C and 34 34 °C. Similarly, other sensors, like the DHT11, were reported to have a maximum error of 2.81 °C in the field compared to 0.22 °C in the controlled environment [
48]. Similarly, Ref. [
46] found that calibration in a laboratory (controlled environment) produced an R
2 of 0.9969 for the PR103J2 sensor, while the same sensor in an office setting (uncontrolled environment) dropped to 0.9638. Other sensor types, such as the DS18B20, were also noted to have reduced accuracy in uncontrolled environments [
66].
Solar radiation stands out as the most significant environmental factor influencing the calibration of sensors in outdoor conditions. Numerous studies have indirectly highlighted this parameter. This was noted as, despite calibration typically occurring indoors, researchers took care to avoid direct sunlight exposure, opting for placement either in shaded room corners or within climate-controlled chambers, away from direct sunlight. For instance, Yamamoto et al. [
39] specifically investigated the solar radiation effects on the accuracy of the SHT15 sensor. Prior to outdoor deployment, the authors conducted a laboratory sensor validation test that resulted in a MAE of 0.19 °C. During the outdoor calibrations, the MAE increased to 1.8 °C; in particular, higher errors were observed at peak hours of solar radiation around noon [
39].
Sun et al. [
75] further explored the correlation between solar radiation and the errors present in the SHT15 temperature sensor, identifying that approximately 60% of the observed errors were due to solar radiation. Additionally, they established a direct relationship between atmospheric temperature error (ATE) and solar radiation (SR), reporting that an increase in solar radiation led to an increase in ATE. For instance, an increase in solar radiation from 0.01 to 3.01 MJm
−2 (megajoules per square metre) led to an increase in ATE from 0.01 °C to 6.06 °C, highlighting the significance of solar radiation in ATE. Similarly, Young et al. [
76] assessed the performance of a custom radiation shield designed to mitigate the ATE caused by solar radiation and air velocity. They reported larger errors ranging from −0.76 °C to 2.56 °C, typically occurring around sunrise, possibly due to low sun angles resulting in the penetration of sun rays in the radiation shield.
Furthermore, Wang et al. [
58] conducted sensor reading corrections using a BNN model for the SHT15 sensor in outdoor settings to reduce the effects of solar radiation on ATE. Higher errors were observed at noon, when the solar radiation was at maximum. Additionally, after using solar radiation as a variable in the correction model (BNN), the mean maximum absolute error significantly reduced by 8% more than in the study by Sun et al., 2015 [
75]. These studies highlight the significance of solar radiation in sensor errors. While some studies utilised radiation shields [
76], others utilised machine learning algorithms to reduce the effects of solar radiation on sensor performance [
39,
55,
74].
Humidity was noted to affect sensor performance; this effect was predominantly observed in specific sensors, such as the DHT22 sensor a type of NTC (Negative Temperature Coefficient) thermistor-based sensor [
77]. This sensor measures both humidity and temperature; thus, the effects of humidity on temperature error could be explained by the sensor’s properties. However, the scarce literature addressing the environmental impacts on such sensors complicates the identification of the optimal low-cost air temperature sensor type for outdoor deployments.
Solar radiation consistently emerged as the most dominant factor affecting temperature sensor accuracy, with fewer references to wind and humidity. These findings emphasize the necessity for further research into the effects of environmental factors on specific low-cost sensors, particularly regarding sensor housing. Future studies should therefore focus on the influence of environmental factors on commonly used sensors like the DHT22, DS18B20, LM35, DHT11, BME280, and other low-cost air temperature sensors. Moreover, other environmental variables should also be considered.
Temperature change and range was noted to significantly influence calibration outcomes and sensor performance. For instance, a comparative analysis between low-cost and high-end calibration setups revealed that rapid temperature increases during the heating phase negatively affected calibration accuracy. Specifically, the AM2315 sensor calibrated in Cuba (from 20 °C to 65 °C within 5 min) showed lower R
2 values compared to calibration in Belgium, where a more gradual increase (20 °C to 50 °C over 41 min) yielded higher R
2 values (0.99–1.0 vs. 0.98) [
28]. This highlights the importance of aligning temperature change rates during calibration with those expected in real-world applications in order to better understand how the sensor will respond under real-world conditions. These findings suggest that environmental setting contributes significantly to sensor accuracy and thus should be considered during calibration.
4.3. Methodological Gaps and Inconsistencies
Inconsistencies were noted in the presentation of the methodologies employed in sensor validation and calibration process. These methodological weaknesses reduce the comparability of the results and limit the replication of the findings into real-world agricultural or environmental applications.
Multiple studies were noted to use different performance metrics for analysis, limiting comparability between the collected studies. Even among studies using the same sensor type (for instance, the DHT22), some studies reported results using RMSE, R
2, and MAE, while other studies reported percentage error deviation, and others presented standard deviation [
54,
60,
62,
77]. Variations were also observed in calibration methods across each study, with different durations for calibration and validation. This often complicates sensor performance assessment, as each sensor type has a unique response time, which can potentially lead to misinterpretation of data.
4.4. Insufficient Calibration Reporting Across Studies
Nearly 14 of the studies did not conduct calibration, and this number could be even larger due to the lack of clear details in the studies that claimed to have calibrated the sensors. A review of low-cost sensors published in 2020 also reported similar findings, noting that only 16 out of 35 studies performed calibration; this figure is rather small, considering the study did not limit itself solely to temperature sensors [
26]. They also observed a lack of standardised procedures and performance presentations for calibration. Another review study, similarly identified a deficiency in standardised calibration procedures by examining papers that mentioned the use of low-cost sensors, although not specifically air temperature sensors; however, the findings remain relevant [
26].
A significant challenge identified is that most studies did not perform or report post-calibration validation results, with only 13 studies mentioning them. This raises issues in assessing how calibration enhanced each sensor’s performance and how sensor performance is influenced post-deployment, whether in indoor or outdoor settings, after calibration. This information is essential for refining and strengthening calibration plans to align more effectively with a sensor’s application.
The lack of studies conducting calibration or sensor validation in field settings also presents a challenge, since this review aimed to be applied to precision agriculture.
4.5. Proposed Guidelines for the Standardization of Calibration Methods for Low-Cost Sensors
Due to the absence of a standardized calibration procedure for low-cost sensors (LCSs), significant variability has been observed across studies, limiting the comparability of findings on a global scale. To address this, we propose the following recommendations to support the development of a unified calibration framework:
Calibration Duration and Replicability: A standardized and agreeable duration for calibration to allow consistent replication across studies should be established. Additionally, clear definition and documentation of the length of the calibration period should be reported.
Reference Instrumentation: The reference device used during calibration should be clearly described, entailing specifications of the reference instrument, such as resolution, accuracy, and measurement uncertainty. This will allow for the use of locally available instruments that meet the minimum reference criteria to reduce dependency on expensive or imported equipment.
Calibration Environment: It is preferable to conduct calibration in a controlled environment to identify systematic errors. Following this, an in situ calibration phase should be carried out to account for environmental variability and improve real-world applicability.
Calibration Model Documentation: The calibration model utilised should be documented, including factors such as the amount and nature of data used during calibration and the software or programming environment used for implementation. For machine learning-based models, details on the preprocessing steps of the data involved should be outlined. The dependent and independent variables used as inputs to the model should also be documented.
Performance Evaluation Metrics: Performance metrics such as the coefficient of determination (R2), root mean square error (RMSE), residual errors, and the full calibration equation (if applicable) should be included, especially for model assessment. For sensor performance assessment, the standard deviation (for precision), mean error (bias), mean absolute error (MAE), and RMSE (identifies average largest expected errors) should be included.
5. Conclusions
This study aimed to conduct an exhaustive review of the existing calibration techniques and procedures for low-cost air temperature sensors and identify the most used low-cost temperature sensors. This highlighted a significant research gap, particularly in outdoor agricultural calibration, where temperatures range from −10 °C to 40 °C. Despite the increasing trend in utilizing low-cost sensors for environmental monitoring, there are still limited studies for their calibration, particularly in field settings.
The key findings of this study are as follows:
Linear regression was the most prevalently used calibration method. However, its performance was affected under varying outdoor environmental conditions, as noted in several studies. Despite this, linear models yielded higher performance in controlled environments, with a R2 < 0.95 and additional sensor improvement after calibration was applied being reported in numerous studies.
Polynomial regression and machine learning algorithms were also considered by researchers. Polynomial models identified non-linear relationships between calibrated and reference sensors. They provided an alternative to linear regression but were rarely used in low-cost air temperature calibration due to their complexity and unpopularity.
Machine learning algorithms demonstrated potential for outdoor sensor calibration despite their implementation challenges for input data managements.
Thermistors (the DHT22 and DHT11) and semiconductor-based sensors were the most utilized sensors in the studies found.
Variations in calibration techniques and performance metrics were noted in the studies. A number of studies presented different performance metrics, limiting comparability between studies.
Environmental factors, notably solar radiation, were found to significantly affect sensor accuracy. Several studies that conducted sensor calibrations in outdoor environments reported sensor drifts attributed to solar radiation. Humidity also influences specific sensor types, particularly the DHT22, due to its dual capability to measure temperature and humidity. Other combined humidity and temperature sensing devices are suspected to encounter similar limitations; however, this remains unproven due to a lack of comprehensive studies in this area. While studies have recommended using radiation shields, their high cost limits their applicability in low-cost environmental monitoring initiatives. Alternatively, some research has proposed using machine learning algorithms to compensate for the impact of environmental factors on sensor accuracy [
39].
Critical recommendations regarding the calibration and validation of low-cost air temperature sensors are provided within this review. For the standard validation or calibration method, it would be recommended to consider the following:
Employ performance metrics, such as R2 and RMSE, for standard validation, where R2 measures model predictive accuracy and RMSE reflects the average deviation between actual and predicted readings.
To account for bias, it is recommended that the mean error or average error be used, which accounts for the difference between the reference sensor and the calibrated sensor readings.
Outline systematically the calibration procedure; this should include details on the calibration settings, including duration and conditions.
Incorporating multiple temperature steps across varied ranges is necessary for accurate sensor performance evaluation in conjunction with the calibration model. This information will clearly outline how the sensor performs alongside the calibration performance model.
Conducting a post-calibration assessment of sensor performance, especially in the same calibrating setting, since sensor performance can be affected when the setting is changed. This indicates that at least two calibration plans are required for accuracy, where the first calibration plan is carried out in the laboratory under the ideal conditions where sensor performance is optimum, after which a post-calibration assessment is carried out to assess improvement. The second calibration plan should be conducted in the setting where the sensor is to be deployed. Carrying out calibration under various conditions is important to improve reliability.
Machine learning algorithms are recommended for outdoor settings; however, the choice of the models should be clearly investigated to ensure compatibility with the microcontroller and platform. The integration of machine learning not only refines calibration practices through automatic recalibration but also accounts for environmental factors as independent variables, thereby improving calibration curves.