Next Article in Journal
ZZ-YOLOv11: A Lightweight Vehicle Detection Model Based on Improved YOLOv11
Previous Article in Journal
A Multimodal Deep Learning Approach for Legal English Learning in Intelligent Educational Systems
Previous Article in Special Issue
An Adaptive Optimization Method for Acoustic Temperature Measurement Topology Based on Multiple Sub-Objectives
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Calibration of Integrated Low-Cost Environmental Sensors for Urban Air Temperature Based on Machine Learning

1
School of Resource and Environmental Sciences, Wuhan University, Wuhan 430079, China
2
Collaborative Innovation Center of Geospatial Technology, Wuhan 430079, China
3
Hubei Luojia Laboratory, Wuhan University, Wuhan 430079, China
4
Key Laboratory of Geographic Information System, Ministry of Education, Wuhan University, Wuhan 430079, China
5
Key Laboratory of Digital Cartography and Land Information Application, Ministry of Natural Resources, Wuhan University, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Sensors 2025, 25(11), 3398; https://doi.org/10.3390/s25113398
Submission received: 26 April 2025 / Revised: 24 May 2025 / Accepted: 26 May 2025 / Published: 28 May 2025
(This article belongs to the Special Issue Integrated Sensor Systems for Environmental Applications)

Abstract

:
Monitoring urban microenvironments using low-cost sensors effectively addresses the spatiotemporal limitations of conventional monitoring networks. However, their widespread adoption is hindered by concerns regarding data quality. Calibrating these sensors is crucial for enabling their large-scale deployment and increasing confidence among researchers and users. This study focuses on an internet of things (IoT) application in Wuhan, China, aiming to enhance the quality of long-term hourly air temperature data collected by low-cost sensors through on-site calibration. Multiple linear regression (MLR) and light gradient boosting machine (LightGBM) algorithms were employed for calibration, with leave-one-out cross-validation (LOOCV) being used for model evaluation. Factors, such as multiple scenarios, spatial distances, and seasonal variations, were also examined for their influence on long-term data calibration. The experimental findings revealed that the LightGBM method consistently outperformed MLR. Calibration using this approach markedly improved the sensor data quality, with the R-squared (R2) value of the sensor with the poorest raw data increasing from 0.416 to 0.957, its mean absolute error (MAE) decreasing from 6.255 to 1.680, and its root mean square error (RMSE) being reduced from 7.881 to 2.148. This study demonstrates the application potential of using LightGBM as an advanced machine learning (ML) method in innovative low-cost sensors, thereby providing a method of obtaining high-quality and real-time information for urban environmental and public health research.

1. Introduction

The growing prevalence of climate change and extreme weather events has heightened the importance of meteorological monitoring [1,2]. Meteorological monitoring is essential for daily weather forecasting, providing disaster warnings, and supporting various activities that are critical to production and daily life. It provides fundamental data for characterizing and analyzing local climate patterns, with temperature being a particularly significant meteorological element.
Urbanization, a defining trend of the 21st century, is projected to double the urban population by 2050 [3]. While cities occupy only 2% of the Earth’s surface, urban residents are responsible for the majority of energy consumption, contributing to rising urban temperatures and the urban heat island effect [4,5]. This phenomenon poses significant challenges to achieving sustainable urban development [6,7]. Numerous studies on thermal comfort have emphasized the critical role of maintaining appropriate temperature and humidity levels for human health in both indoor and outdoor settings [8,9,10].
The increasing focus on urban temperature and outdoor thermal comfort has led to extensive research on the influence of urban geometry, the greening effect, and other factors in mitigating urban heat [8,11,12,13,14]. Despite the importance of accurate meteorological data, the sparse distribution of weather stations in many regions limits the availability of dense ground-level observations [15]. This gap often results in inconsistencies in localized temperature data, complicating urban heat island mitigation strategies [16]. Additionally, localized weather events that affect smaller areas present significant challenges to urban management [17]. In summary, conducting refined environmental monitoring at a granular scale within urban areas is critically important.
The advent of low-cost sensor technology in recent years has revolutionized the environmental monitoring landscape [18,19]. While standard monitoring networks established by official agencies provide highly accurate data, their instruments are expensive, complex, and require professional operation, regular maintenance, and strict environmental conditions [20,21,22]. These factors often result in a sparse geographical distribution of monitoring stations. Additionally, data from official agencies are often subjected to restricted access and significant time delays.
In contrast, low-cost sensors are approximately three-orders-of-magnitude less expensive than standard reference instruments [18], which enables their deployment across larger areas at higher densities. Given the substantial variability in urban microenvironments due to complex human activities, the economical and compact nature of low-cost sensors makes them an appealing choice for experiments and applications that require dense spatial mapping [23,24,25,26]. Studies utilizing low-cost sensors to examine the spatial variability of urban air quality [27,28] have demonstrated their efficacy in capturing environmental changes at fine spatial scales.
Low-cost sensors offer significant potential to address the spatial and temporal limitations of standard monitoring networks [29,30]. Deploying high-density, low-cost sensors in localized areas presents many benefits, but concerns about data quality hinder their widespread adoption [31,32,33]. Public and community interest in low-cost sensors is growing, yet the absence of reference instruments for comparison during their application [34] often results in unreliable data. These sensors are highly sensitive to environmental changes [34,35,36,37,38], and laboratory calibration alone fails to resolve these issues effectively [39]. Overall, improving the quality of the data collected by low-cost sensors through calibration is critical to enhancing user confidence and facilitating their large-scale application. Therefore, ML approaches have emerged as a leading approach in large-scale applications [19,40,41,42]. However, factors, such as the experiment duration, which is often overlooked, can affect calibration performance. Additionally, small-scale or unevenly distributed studies limit the generalization of calibration methods. Most studies on low-cost sensors focus on air quality [36,43,44,45], with relatively few studies leveraging urban meteorological data from low-cost sensor monitoring networks.
In past studies on the calibration and measurement of air temperature sensors, Liu et al. [46] proposed a calibration scheme and established a data calibration model using a backpropagation neural network, but the data only included 10 cloudy days in June and July, and only air temperature data was used as the input and output. Another study used typical liquid thermostats and climate chambers for measurements to determine the characteristics of a calibration system, but the cost of high-precision experimental equipment could not be ignored [47]. Sun et al. [48] developed a solar radiation lookup table, but the method relied on solar radiation data sensed by high-cost pyranometers and was only used to process 14 days’ worth of data. Yamamoto et al. [1] used an artificial neural network to balance the effects of multiple environmental factors on their measurements, but only in three different locations. Tang et al. [49] tried the LightGBM method to correct the numerical prediction results of their temperature prediction method temperature, but the study did not take into account the application advantages of low-cost sensors. Cao et al. [50] proposed a method for establishing a temperature-prediction model based on time series analysis, but it was mainly aimed at modern agricultural greenhouse systems. A previous case study of low-cost internet of things (IoT) environmental sensors in Wuhan, China, evaluated the accuracy of daily average air temperature data and tested the feasibility of a wireless integrated sensor network application [51]. However, detailed statistics and the calibration of hourly monitoring data remain underexplored, highlighting a gap in the application of low-cost sensors for urban meteorological monitoring.
In this study, we developed a calibration method for calibrating low-cost sensors for the long-term measurement of temperature, considering different locations and multiple environmental factors. First, the necessity of calibrating low-cost sensors is highlighted by comparing raw air temperature data from low-cost sensors with data from standard meteorological stations. Next, we calibrated a large dataset of hourly air temperature data from long-term field environments using MLR and LightGBM methods and compared the results with the original data to evaluate the obtained improvement. In addition, the calibration model’s migration capability was evaluated across different surface types and spatial locations. The findings indicate that the proposed calibration method significantly enhances the data quality of low-cost sensors and remains stable across diverse scenes and spatial locations.

2. Materials and Methods

2.1. Data Acquisition

Located in the middle reaches of the Yangtze River, Wuhan is rich in natural resources and diverse surface types, which makes it a representative and suitable research area. According to public data from the Hubei Provincial Government, Wuhan’s urban population reached 14.77 million by the end of 2021, solidifying its status as a megacity. This study is based on an IoT research initiative utilizing low-cost environmental sensors in Wuhan, a megacity in China. A total of 252 monitoring units were deployed across Wuhan, and these units were designed for cost-effective operation in real-world environments [51]. Some of these sensors were deployed near standard meteorological stations as site scales for comparison and validation, and the sensors used in this study belong to this category. The low-cost sensors used in this study were innovatively integrated and developed by us. As shown in Figure 1, the sensors are compact, durable, and easy to install. They can monitor multiple meteorological elements such as temperature, humidity, and pressure, and have functions such as timing, positioning, and wireless data transmission. The details of the sensor elements used are shown in Table 1. It is worth mentioning that these sensors do not rely on wired power, but are powered by lithium batteries and solar energy, providing sufficient energy independence and saving energy consumption, which is described in more detail in our previous research [51]. The integrated design allows for the real-time wireless transmission of data, with user-adjustable sampling intervals. For this experiment, the low-cost sensors were set to sample data once an hour, which was aligned with the period reported by standard weather stations. Data preprocessing included removing duplicates due to multiple responses and missing values due to instrument failure.
Sensor deployment adhered to principles of wide spatial distribution, coverage of diverse scenarios, and proximity to standard monitoring stations. The nine experimental sites within Wuhan ranged from 9 to 86 km apart (see Figure 2). The locations selected for the experiment all have standard weather stations built in place. Our low-cost sensors were deployed together withweather stations, and their observation height and the direction of the solar panels are basically consistent with those of the weather stations. The sensors were deployed across five surface types: cultivated land, shrubs, woods, built-up areas, and grasslands. Data collection spanned from December 2021 to November 2022, covering four seasons in the Wuhan area. The largest valid dataset recorded by a single sensor compromised 8419 h of data.

2.2. Methods

To evaluate the calibration approach comprehensively, the data were divided into three cases: seasonal data (hourly level), annual data (hourly level), and annual data (daily average level). Seasons were defined according to Chinese regional conventions: spring (March–May), summer (June–August), autumn (September–November), and winter (December–February). The annual data spanned 12 months, from December 2021 to November 2022. Daily average temperatures were calculated using measurements taken at 02:00, 08:00, 14:00, and 20:00. In addition, time was standardized to obtain two feature variables for model input, namely hours and days. The time corresponding to the information collected by the low-cost sensor was extracted, identified as the hours of the day and the days of the year, normalized, and mapped to a 2π period. For seasonal data, the training dataset included approximately 16,000 hourly observations from eight low-cost sensors, while the testing dataset compromised about 2000 observations from another sensor. The annual model’s hourly training dataset contained approximately 64,000 observations, with 8000 being used for testing. The daily average training dataset for the annual model consisted of approximately 2920 observations, and the testing dataset included 365 observations, as shown in Figure 3.
This study employed MLR and LightGBM methods for calibration, with model performance being validated using LOOCV. MLR, a basic yet powerful model, was used to examine the relationships between the explanatory variables and the dependent variable, establishing an additive linear relationship when multiple explanatory variables were involved. However, MLR’s sensitivity to outliers and model instability were noted as potential drawbacks. LightGBM is an adaptive gradient-boosting model that has received widespread attention since it was proposed in 2017 [52]. LightGBM is based on the gradient boosting decision tree (GBDT) algorithm, which uses decision trees as base learners and uses the negative gradient of the loss function as the residual approximation of the current decision tree to fit a new decision tree in the next iteration. Compared with the GBDT method, LightGBM has multiple technical improvements, making it superior in efficiency and accuracy.
Both models were applied to calibrate the raw air temperature data collected by the nine sensors, with the trained models being evaluated using test datasets. The LOOCV approach ensured accurate results by using eight sites as training data and the remaining site as testing data in each iteration, maximizing the use of the available data for validation. In this experiment, nine standard meteorological stations served as reference points, with a low-cost sensor being installed near each station. The air temperature data collected by the standard meteorological stations were used as labels during the model fitting process. To enhance the model’s generalization capability, the training input included only data collected by the low-cost sensors. These inputs compromised temperature, humidity, air pressure, and the previously mentioned time variables, hour and day. This approach ensures that the calibrated model, following comprehensive evaluation, can be effectively applied in areas lacking standard meteorological stations, enabling the efficient calibration of a large number of low-cost sensors.
This experiment employed the LOOCV method for verification. In this approach, with N total samples, N-1 samples are used as training data for each iteration, and the remaining sample serves as the testing data, resulting in N rounds of training and validation. The key advantage of LOOCV is its ability to fully utilize a given dataset, producing results that are both comprehensive and accurate. This method effectively evaluates the model’s generalization ability across different samples while minimizing the impact of randomness, as every sample is used as part of the validation set. This eliminates biases caused by specific data splits and ensures robust performance evaluation. Given the relatively limited sample size in this experiment and the critical importance of accurate calibration for each sensor, LOOCV was deemed highly suitable. Specifically, data from eight stations were used for training in each iteration, while data from the remaining station were used for testing, totalling nine rounds of training and verification.
The spatial generalization capability of the model is crucial in calibrating low-cost sensors, as these are installed in areas that lack standard monitoring stations or where relevant data are unavailable. In this experiment, the locations of the training and testing data differ. By recording the surrounding scenes during sensor deployment in the field, the nine sites were categorized into five typical surface types, as shown in Table 2. This study focused on annual data (hourly level) and selected three surface types, namely cultivated land, shrubland, and woodlands, for comparative analysis. The experiment specified one site as the testing data and evaluated the impact of various scenes and distances by selecting data from several sites for training. The experiment was divided into three groups to assess the influence of the surface types and distances on calibration. First, for sites with similar straight-line distances to the testing site, the influence of different surface types on the calibration accuracy was evaluated. Training data were selected from two sites, one sharing the same surface type as the testing site and the other with a different surface type. Second, for sites with the same surface type, the effect of varying the straight-line distances between the testing and training sites was examined. Due to a limited number of sites with identical surface types, this comparison was only feasible for cultivated land. Third, the experiment considered variations in both the surface type and straight-line distance. One site was designated for testing, while the remaining sites were sorted by distance and used as training data. All calculations were performed in the Python 3.7.7, and all codes were written and debugged in the integrated development environment PyCharm 2020.1 x64 version.

2.3. Evaluation Metrics

We developed MLR and LightGBM models to calibrate nine low-cost sensors deployed at different locations and used LOOCV for verification to assess the spatial generalization capability of the models. The experiment considered the influence of multiple scenes and distances to provide a comprehensive evaluation of the calibration methods’ performance across different scenarios. The evaluation metrics used in this study include R2, RMSE, and MAE. The R2 value was computed using Equation (1).
R 2 = i = 1 n Y i Y ¯ X i X ¯ i = 1 n Y i Y ¯ 2 i = 1 n X i X ¯ 2 2
where X represents the value estimated by the model, Y indicates the reference value from the standard weather station, X ¯ signifies the average of the values estimated by the model, Y ¯ corresponds to the average of the reference values, and n is the number of samples. The closer the R2 value is to 1, the better the model calibration. The RMSE was calculated using Equation (2).
RMSE = i = 1 n Y i X i 2 n
where the closer the RMSE is to 0, the better the model calibration. The MAE was calculated using Equation (3).
MAE = 1 n i = 1 n Y i X i
where the closer the MAE is to 0, the better the calibration of the model. Using the original data monitored by low-cost sensors and the reference data from standard meteorological stations, the accuracy of the low-cost sensor data calibrated by different models is evaluated. The evaluation using these three indicators effectively demonstrated the improvement in the low-cost sensor data after correction with our method.

3. Results

3.1. Differences Between Sensors and Stations

The statistics of the raw air temperature data collected by the low-cost integrated sensors and standard meteorological stations during the experiment are presented in Table 3. The comparison, conducted at the hourly level from December 2021 to November 2022, shows that the median and average values of the low-cost sensors and the standard meteorological stations are similar, with a small difference in the minimum values. However, the maximum values recorded by the low-cost sensors were notably higher than those of the nearby standard meteorological stations. These statistical results indicate a significant deviation when comparing the hourly air temperature data from the low-cost sensors to the reference data from the standard meteorological stations.
The differences between the low-cost sensors and the standard weather stations at the nine sites are illustrated in Figure 4. We used annual data (hourly level), of which each sensor had more than 8000 samples. The R2 value ranged from 0.416 to 0.566. Despite the sensors being located at different sites, their data performance remained relatively consistent. This comparison indicates that, while there are noticeable deviations between the low-cost sensors and the standard weather stations, the long-term trends of the data align closely. Therefore, it is both necessary and feasible to improve the quality of data obtained from low-cost sensors through effective calibration methods.

3.2. Calibration and Verification Results

To evaluate the performance of the MLR and LightGBM model, we analyzed the results of calibration at different frequencies (seasonal and annual) and for data types (hourly and daily averages). We also considered the impact of multiple scenes and distances to comprehensively assess the spatial generalization ability of the calibration method.
The verification results of the calibration of hourly data from across the four seasons are shown in Figure 5. Each column represents a site, while each row corresponds to a specific season. The histograms in each subfigure represent the original MAE and the MAE results after the MLR and LightGBM calibration. Compared to the original data, both the MLR and LightGBM calibration methods improved the MAE value, with the LightGBM model showing significant advantages. The results also reveal that the original MAE for the stations is better in winter than in the other three seasons. This could be due to the lower air temperature in winter, which reduces the overestimation of high values by low-cost sensors. Overall, the LightGBM method performed well at all sites and in all seasons, confirming the effectiveness of ML-based calibration for hourly seasonal data collected by low-cost sensors at different spatial locations.
To further demonstrate the effectiveness of the LightGBM calibration method, several days were randomly selected from each of the four seasons, and the temporal variation in the data before and after calibration is shown in Figure 6. Hourly air temperature data from consecutive days in four seasons were selected to illustrate how the air temperature data from standard weather stations and low-cost sensors change over time before and after calibration. Site No. 5 was randomly chosen as a demonstration. As can be seen in the middle of Figure 6a, the raw data from the low-cost sensor show abnormal performance, with little variation over time. In this extreme case, the LightGBM calibration method still performs well.
Next, we validated the calibration performance of the MLR and LightGBM models on annual hourly air temperature data. Figure 7 presents the raw and calibrated R2, MAE, and RMSE values for the low-cost sensors and standard weather stations at nine sites. After calibration with our LightGBM method, the R2 value improved from 0.416–0.566 to 0.914–0.969, the MAE value improved from 4.934–6.255 to 1.370–2.364, and the RMSE value improved from 6.678–7.881 to 1.782–3.007. Notably, the lowest original R2 (0.416) was at Site No. 8, which the MLR model improved to 0.712 and the LightGBM model further enhanced to 0.957. These results show that our method offers strong calibration capabilities for hourly temperature data on an annual scale. The LightGBM-based calibration method performed the best, significantly improving the quality of the data from the low-cost sensors and demonstrating clear advantages over the MLR-based calibration method.
Although overall evaluation metrics are crucial, the performance of the calibration results over time is also important. Therefore, we present box plots of air temperature (hourly level) by month for low-cost sensors, standard weather stations, and LightGBM calibration in Figure 8. To account for the different spatial locations, we refer to Figure 2, which displays the results for Site No. 2 (southernmost), Site No. 4 (northernmost), and Site No. 3 (central and eastern part). The box plot shows that the quality of low-cost sensor data significantly improves after calibration. Over time, the original errors become more pronounced in the months with higher air temperatures, although the calibration effect was relatively stable. In addition, the comparison results of the annual data (hourly level) of nine sensors at different locations and the data of standard weather stations before calibration are shown in Figure 9a. The model performance after the LightGBM calibration is shown in Figure 9b, where the overall R2 can be seen to improve from 0.498 to 0.954, indicating a significant enhancement.
The effectiveness of the ML calibration method in enhancing the quality of long-term data from low-cost sensors is demonstrated in Figure 10. Four spatially dispersed stations (Nos. 2, 3, 8, and 9) were randomly selected, and their daily average temperatures over 12 months are displayed. Each sub-figure presents the original low-cost sensor data, the reference data from the standard meteorological stations, and the LightGBM-calibrated daily average data. While the trends of the raw low-cost sensor data (blue) are similar to those of the reference data (green), noticeable value discrepancies exist. The LightGBM-calibrated results (red) align closely with the reference data, both in their trends and values. These findings highlight the excellent and stable performance of the ML model in calibrating long-term daily average data.

3.3. Differences in Multiple Scenes

To assess the effects of varying scenes and distances on the calibration performance of the tested models, three experiments were designed: (1) evaluating the impact of surface types when the straight-line distances between the training and testing sites are similar; (2) assessing the effect of different distances when the surface types are identical; (3) comparing the results when the surface types differ, with the distances sorted from nearest to farthest.
Firstly, when the testing and training sites have comparable straight-line distances, models trained with data from the same surface type as the testing site demonstrated superior calibration performance. As presented in Table 4, when calibrating sensor No. 5, which was located in cultivated land, two sites with comparable distances from the testing location were selected: Site No. 3, 11 km away, and Site No. 7, 10 km away. Among these sites, Site No. 3, which shares the same surface type as the testing site, yielded better R2 results. Similarly, for sensor No. 6, located in woodlands, training data were selected from Site No. 1, 9 km away, and Site No. 9, 10 km away. Site No. 1, which shares the same surface type as the testing site, provided superior calibration outcomes. When calibrating sensor No. 8, located in shrubland, Site No. 9, 17 km away, and Site No. 4, 18 km away, were used as training data. Site No. 9, having the same surface type as the testing site, showed slightly better results.
When the surface types of the training and testing sites are the same, closer straight-line distances result in better calibration performance. For instance, training sites No. 3 and No. 4, located at different distances from testing site No. 5, were used to establish the calibration model. The results in Table 3 indicate that the site closer to the testing site (No. 3) provided a clear advantage in terms of its calibration accuracy.

3.4. Performance of High Values

The experimental results indicate that air temperature data recorded by low-cost sensors tend to be overestimated compared to those from standard weather stations, particularly during high-temperature periods. To analyze this further, air temperature data (hourly level) from nine low-cost sensors across various spatial locations were aggregated over 12 months. The observed temperature range for the low-cost sensors was from −5.346 °C to 49.813 °C, so the value range was set to −6–50 °C and divided into seven value intervals, as shown in Table 5. The distribution of the samples within these intervals revealed that the highest temperature interval contained the fewest samples and exhibited poor evaluation metric outcomes. However, it cannot be ignored that the low temperature interval also performed poorly. Consequently, the calibration performance of the LightGBM method was also affected by the quality of the original data, particularly in the more extreme temperature ranges.

4. Discussion

In this study, we used LightGBM to calibrate long-term air temperature data collected by low-cost sensors. ML methods have been widely used in the field of low-cost sensor data processing. However, previous research has mainly focused on low-cost particulate matter sensors [36,45], and low-cost meteorological sensors have received comparatively little attention. Nevertheless, the dense meteorological data obtained by these sensors hold immense value for diverse studies, including studies on urban heat islands and crop modeling. These data also play a vital role in urban monitoring and management [25,27], as well as the health and daily lives of urban residents.
Nine sites across Wuhan were utilized as representative locations to calibrate air temperature data from low-cost sensors. To enhance the calibration for long-term data, MLR and LightGBM methods were used, with comparisons being conducted across three data types: seasonal data (hourly level), annual data (hourly level), and annual data (daily average level). The three metrics, R2, MAE, and RMSE, revealed that the LightGBM model consistently delivers superior and more stable performance. The LOOCV approach, which allows each sample to serve as a validation set, further underscored the model’s reliability by minimizing the bias associated with specific divisions.
This study also investigated the effects of multiple scenes and distances on the resulting calibration through three experimental scenarios. The results indicated that models which were calibrated using training sites with the same surface type as the testing sites performed better when the distances were similar. However, due to a limited number of sites with similar surface types, there are insufficient experiments on the impact of the distance when the surface types are similar.
Previous studies suggest that closer distances generally improve the verification results, while distant sites may negatively affect the resulting performance. However, calibration experiments often overlook real-world field conditions and variations in distance. This study indicates that distance alone does not have a decisive impact when both the surface type and distance differ. Contrary to the assumption that shorter distances yield better calibration, models applied to more distant sites can still demonstrate superior transmission performance. Therefore, the effective calibration of low-cost sensor models requires a comprehensive evaluation of diverse distances and surface types. Due to the limited availability of sites with uniform surface types, only cultivated land areas were compared, which introduced certain limitations to the findings. Nonetheless, the results remain valuable. When the surface types differ between training and testing sites, they were sorted from near to far according to the straight-line distance to the test site. The calibration effects do not consistently improve with shorter distances, confirming that distance is not a definitive factor in calibration performance when the surface types are different.
Table 4 indicates that the calibration model developed at Site No. 2, located on grasslands, exhibited irregular performance when calibrating three low-cost sensors across different surface types. For instance, despite being the farthest site with a different surface type, Site No. 2 achieved the highest model calibration R2 when calibrating site No. 8. Figure 2 reveals that Site No. 2 resembles a playground and is recorded as an artificial horse farm with synthetic turf instead of natural grass according to the actual field deployment records. In contrast to natural lawns, this artificial turf is unaffected by seasonal changes and is significantly affected by horse activities and human intervention. Future research should avoid such unique deployment environments to ensure that site selection is more representative and generalizable. To fully understand the original data and effectively evaluate the calibration model’s performance, no special adjustments were made for low and high values. However, as shown in Table 5, the low- and high-value data will cause a certain degree of adverse effects. Future research should address this issue by increasing the number of sites and samples and incorporating pre-correction methods, such as eliminating or smoothing high-value data, particularly in high-temperature ranges.
The key contributions of this study can be summarized as follows. First, the developed low-cost sensors were tested in multiple real-world, complex field conditions with long-term on-site monitoring. Second, seasonal and annual models were constructed to assess the performance of calibration method over various time scales, including air temperature data at hourly and daily averages. Third, the spatial generalizability of the model was evaluated considering multiple scenarios and distances. Fourth, the model operates independently of standard station data, significantly improving experimental efficiency and enabling calibration in regions without standard monitoring stations. Overall, the LightGBM calibration method exhibits the best performance in improving the quality of data. Our proposed approach is effective and scalable, offering practical value and contributing to the widespread application of low-cost sensors.

5. Conclusions

This study proposes and evaluates an efficient calibration method for calibrating air temperature measurements from low-cost sensors under real field conditions. We collected data from multiple spatially distributed sites to assess the method, accounting for variations across time, distance, and environmental conditions. The validation results from seasonal and annual models demonstrate that the proposed LightGBM method significantly enhances the data quality of low-cost sensor data, maintaining excellent and stable performance in datasets of different time scales. Notably, our proposed calibration model is effective even in areas that lack a nearby standard meteorological station, enabling the accurate calibration of low-cost sensors without the need to rely on official data. This contribution highlights the model’s practicality and adaptability in resource-limited regions. This study focuses on identifying the most suitable calibration method for self-developed, integrated low-cost meteorological sensors. It lays the groundwork for further advancements, such as integrating remote-sensing data for near-surface air temperature inversion. In future research, we need to obtain more datasets with similar surface types which are located in different geographical locations and to consider the influence of more environmental factors to improve our methods. Overall, the proposed method is highly significant for calibrating the air temperature measurements made by low-cost sensors in various practical scenarios, improving data quality, and promoting confidence in the application and scalability of low-cost sensors.

Author Contributions

Conceptualization, F.N., C.Z. and H.S.; methodology, F.N., C.Z. and L.L.; software, F.N. and L.L.; validation, F.N. and C.Z.; data curation, F.N. and H.S.; writing—original draft preparation, F.N., C.Z. and L.L.; writing—review and editing, F.N., C.Z., L.L. and H.S.; supervision, H.S. and C.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China (Grant No. 42271488) and the Key Research and Development Program of Hubei Province (Grant No. 2023BAB066).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Acknowledgments

The authors would like to thank Ganghua Ni, Tian Xie, Guanglei Xie, Guang Hu, Rui Wang, and Bingxin Zou from Wuhan University, China, for their support during fieldwork. We also extend our gratitude to Jing Yan from the Hubei Provincial Meteorological Bureau for their valuable contributions to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yamamoto, K.; Togami, T.; Yamaguchi, N.; Ninomiya, S. Machine Learning-Based Calibration of Low-Cost Air Temperature Sensors Using Environmental Data. Sensors 2017, 17, 1290. [Google Scholar] [CrossRef] [PubMed]
  2. Swamy, G.; Erva, R.R.; Pujari, M.; Kodavaty, J. An Overview on Patterns, Monitoring, and Modeling of the Urban Climate Changes. Phys. Chem. Earth Parts A/B/C 2024, 135, 103625. [Google Scholar] [CrossRef]
  3. Hernández-Gordillo, A.; Ruiz-Correa, S.; Robledo-Valero, V.; Hernández-Rosales, C.; Arriaga, S. Recent Advancements in Low-Cost Portable Sensors for Urban and Indoor Air Quality Monitoring. Air Qual. Atmos. Health 2021, 14, 1931–1951. [Google Scholar] [CrossRef]
  4. Kolokotroni, M.; Giridharan, R. Urban Heat Island Intensity in London: An Investigation of the Impact of Physical Characteristics on Changes in Outdoor Air Temperature during Summer. Sol. Energy 2008, 82, 986–998. [Google Scholar] [CrossRef]
  5. Gago, E.J.; Roldan, J.; Pacheco-Torres, R.; Ordóñez, J. The City and Urban Heat Islands: A Review of Strategies to Mitigate Adverse Effects. Renew. Sustain. Energy Rev. 2013, 25, 749–758. [Google Scholar] [CrossRef]
  6. Evans, B.; Elisei, P.; Rosenfeld, O.; Roll, G.; Figueiredo, A.; Keiner, M. HABITAT III—Toward a New Urban Agenda. Disp—Plan. Rev. 2016, 52, 86–91. [Google Scholar] [CrossRef]
  7. Voukkali, I.; Papamichael, I.; Loizia, P.; Zorpas, A.A. Urbanization and Solid Waste Production: Prospects and Challenges. Env. Sci. Pollut. Res. 2024, 31, 17678–17689. [Google Scholar] [CrossRef]
  8. Jamei, E.; Rajagopalan, P.; Seyedmahmoudian, M.; Jamei, Y. Review on the Impact of Urban Geometry and Pedestrian Level Greening on Outdoor Thermal Comfort. Renew. Sustain. Energy Rev. 2016, 54, 1002–1017. [Google Scholar] [CrossRef]
  9. Wolkoff, P. Indoor Air Humidity, Air Quality, and Health—An Overview. Int. J. Hyg. Environ. Health 2018, 221, 376–390. [Google Scholar] [CrossRef]
  10. Ryan, I.; Deng, X.; Thurston, G.; Khwaja, H.; Romeiko, X.; Zhang, W.; Marks, T.; Yu, F.; Lin, S. Measuring Students’ Exposure to Temperature and Relative Humidity in Various Indoor Environments and across Seasons Using Personal Air Monitors. Hyg. Environ. Health Adv. 2022, 4, 100029. [Google Scholar] [CrossRef]
  11. Aram, F.; Higueras García, E.; Solgi, E.; Mansournia, S. Urban Green Space Cooling Effect in Cities. Heliyon 2019, 5, e01339. [Google Scholar] [CrossRef] [PubMed]
  12. Yu, Z.; Chen, S.; Wong, N.H.; Ignatius, M.; Deng, J.; He, Y.; Hii, D.J.C. Dependence between Urban Morphology and Outdoor Air Temperature: A Tropical Campus Study Using Random Forests Algorithm. Sustain. Cities Soc. 2020, 61, 102200. [Google Scholar] [CrossRef]
  13. Wong, N.H.; Tan, C.L.; Kolokotsa, D.D.; Takebayashi, H. Greenery as a Mitigation and Adaptation Strategy to Urban Heat. Nat. Rev. Earth Env. 2021, 2, 166–181. [Google Scholar] [CrossRef]
  14. Schmidt, V. Urban Morphology as a Key Parameter for Mitigating Urban Heat?—A Literature Review. IOP Conf. Ser. Earth Environ. Sci. 2024, 1363, 012074. [Google Scholar] [CrossRef]
  15. AghaKouchak, A.; Farahmand, A.; Melton, F.S.; Teixeira, J.; Anderson, M.C.; Wardlow, B.D.; Hain, C.R. Remote Sensing of Drought: Progress, Challenges and Opportunities. Rev. Geophys. 2015, 53, 452–480. [Google Scholar] [CrossRef]
  16. Aflaki, A.; Mirnezhad, M.; Ghaffarianhoseini, A.; Ghaffarianhoseini, A.; Omrany, H.; Wang, Z.-H.; Akbari, H. Urban Heat Island Mitigation Strategies: A State-of-the-Art Review on Kuala Lumpur, Singapore and Hong Kong. Cities 2017, 62, 131–145. [Google Scholar] [CrossRef]
  17. Tikle, S.; Anand, V.; Das, S. Geospatial Practices for Airpollution and Meteorological Monitoring, Prediction, and Forecasting. In Geospatial Practices in Natural Resources Management; Shit, P.K., Dutta, D., Das, T.K., Das, S., Bhunia, G.S., Das, P., Sahoo, S., Eds.; Springer International Publishing: Cham, Switzerland, 2024; pp. 549–566. ISBN 978-3-031-38004-4. [Google Scholar]
  18. Morawska, L.; Thai, P.K.; Liu, X.; Asumadu-Sakyi, A.; Ayoko, G.; Bartonova, A.; Bedini, A.; Chai, F.; Christensen, B.; Dunbabin, M.; et al. Applications of Low-Cost Sensing Technologies for Air Quality Monitoring and Exposure Assessment: How Far Have They Gone? Environ. Int. 2018, 116, 286–299. [Google Scholar] [CrossRef]
  19. Kang, Y.; Aye, L.; Ngo, T.D.; Zhou, J. Performance Evaluation of Low-Cost Air Quality Sensors: A Review. Sci. Total Environ. 2022, 818, 151769. [Google Scholar] [CrossRef]
  20. Chow, J.C. Measurement Methods to Determine Compliance with Ambient Air Quality Standards for Suspended Particles. J. Air Waste Manag. Assoc. 1995, 45, 320–382. [Google Scholar] [CrossRef]
  21. Chong, C.-Y.; Kumar, S.P. Sensor Networks: Evolution, Opportunities, and Challenges. Proc. IEEE 2003, 91, 1247–1256. [Google Scholar] [CrossRef]
  22. Idrees, Z.; Zheng, L. Low Cost Air Pollution Monitoring Systems: A Review of Protocols and Enabling Technologies. J. Ind. Inf. Integr. 2020, 17, 100123. [Google Scholar] [CrossRef]
  23. Kumar, P.; Morawska, L.; Martani, C.; Biskos, G.; Neophytou, M.; Di Sabatino, S.; Bell, M.; Norford, L.; Britter, R. The Rise of Low-Cost Sensing for Managing Air Pollution in Cities. Environ. Int. 2015, 75, 199–205. [Google Scholar] [CrossRef] [PubMed]
  24. Crilley, L.R.; Shaw, M.; Pound, R.; Kramer, L.J.; Price, R.; Young, S.; Lewis, A.C.; Pope, F.D. Evaluation of a Low-Cost Optical Particle Counter (Alphasense OPC-N2) for Ambient Air Monitoring. Atmos. Meas. Tech. 2018, 11, 709–720. [Google Scholar] [CrossRef]
  25. Ródenas García, M.; Spinazzé, A.; Branco, P.T.; Borghi, F.; Villena, G.; Cattaneo, A.; Di Gilio, A.; Mihucz, V.G.; Gómez Álvarez, E.; Lopes, S.I.; et al. Review of Low-Cost Sensors for Indoor Air Quality: Features and Applications. Appl. Spectrosc. Rev. 2022, 57, 747–779. [Google Scholar] [CrossRef]
  26. Radić, J.; Brkić, M.; Keser, T.; Obrovski, B.; Mihajlović, I.; Toskić, M.V. Distributed Wireless IoT Based Sensing and Quality Monitoring System in Protection of Wetlands Groundwater Areas. Meas. Sci. Technol. 2024, 35, 125110. [Google Scholar] [CrossRef]
  27. Gao, M.; Cao, J.; Seto, E. A Distributed Network of Low-Cost Continuous Reading Sensors to Measure Spatiotemporal Variations of PM2.5 in Xi’an, China. Environ. Pollut. 2015, 199, 56–65. [Google Scholar] [CrossRef]
  28. Heimann, I.; Bright, V.B.; McLeod, M.W.; Mead, M.I.; Popoola, O.A.M.; Stewart, G.B.; Jones, R.L. Source Attribution of Air Pollution by Spatial Scale Separation Using High Spatial Density Networks of Low Cost Air Quality Sensors. Atmos. Environ. 2015, 113, 10–19. [Google Scholar] [CrossRef]
  29. Mead, M.I.; Popoola, O.A.M.; Stewart, G.B.; Landshoff, P.; Calleja, M.; Hayes, M.; Baldovi, J.J.; McLeod, M.W.; Hodgson, T.F.; Dicks, J.; et al. The Use of Electrochemical Sensors for Monitoring Urban Air Quality in Low-Cost, High-Density Networks. Atmos. Environ. 2013, 70, 186–203. [Google Scholar] [CrossRef]
  30. Zheng, T.; Bergin, M.H.; Johnson, K.K.; Tripathi, S.N.; Shirodkar, S.; Landis, M.S.; Sutaria, R.; Carlson, D.E. Field Evaluation of Low-Cost Particulate Matter Sensors in High- and Low-Concentration Environments. Atmos. Meas. Tech. 2018, 11, 4823–4846. [Google Scholar] [CrossRef]
  31. Rai, A.C.; Kumar, P.; Pilla, F.; Skouloudis, A.N.; Di Sabatino, S.; Ratti, C.; Yasar, A.; Rickerby, D. End-User Perspective of Low-Cost Sensors for Outdoor Air Pollution Monitoring. Sci. Total Environ. 2017, 607–608, 691–705. [Google Scholar] [CrossRef]
  32. Castell, N.; Dauge, F.R.; Schneider, P.; Vogt, M.; Lerner, U.; Fishbain, B.; Broday, D.; Bartonova, A. Can Commercial Low-Cost Sensor Platforms Contribute to Air Quality Monitoring and Exposure Estimates? Environ. Int. 2017, 99, 293–302. [Google Scholar] [CrossRef] [PubMed]
  33. Abdulkarem, M.; Samsudin, K.; Rokhani, F.Z.; Rasid, M.F.A. Wireless Sensor Network for Structural Health Monitoring: A Contemporary Review of Technologies, Challenges, and Future Direction. Struct. Health Monit. 2020, 19, 693–735. [Google Scholar] [CrossRef]
  34. Jiao, W.; Hagler, G.; Williams, R.; Sharpe, R.; Brown, R.; Garver, D.; Judge, R.; Caudill, M.; Rickard, J.; Davis, M.; et al. Community Air Sensor Network (CAIRSENSE) Project: Evaluation of Low-Cost Sensor Performance in a Suburban Environment in the Southeastern United States. Atmos. Meas. Tech. 2016, 9, 5281–5292. [Google Scholar] [CrossRef]
  35. Maag, B.; Zhou, Z.; Thiele, L. A Survey on Sensor Calibration in Air Pollution Monitoring Deployments. IEEE Internet Things J. 2018, 5, 4857–4870. [Google Scholar] [CrossRef]
  36. Karagulian, F.; Barbiere, M.; Kotsev, A.; Spinelle, L.; Gerboles, M.; Lagler, F.; Redon, N.; Crunaire, S.; Borowiak, A. Review of the Performance of Low-Cost Sensors for Air Quality Monitoring. Atmosphere 2019, 10, 506. [Google Scholar] [CrossRef]
  37. Venkatraman Jagatha, J.; Klausnitzer, A.; Chacón-Mateos, M.; Laquai, B.; Nieuwkoop, E.; van der Mark, P.; Vogt, U.; Schneider, C. Calibration Method for Particulate Matter Low-Cost Sensors Used in Ambient Air Quality Monitoring and Research. Sensors 2021, 21, 3960. [Google Scholar] [CrossRef]
  38. González Rivero, R.A.; Schalm, O.; Alvarez Cruz, A.; Hernández Rodríguez, E.; Morales Pérez, M.C.; Alejo Sánchez, D.; Martinez Laguardia, A.; Jacobs, W.; Hernández Santana, L. Relevance and Reliability of Outdoor SO2 Monitoring in Low-Income Countries Using Low-Cost Sensors. Atmosphere 2023, 14, 912. [Google Scholar] [CrossRef]
  39. Zimmerman, N.; Presto, A.A.; Kumar, S.P.N.; Gu, J.; Hauryliuk, A.; Robinson, E.S.; Robinson, A.L.; Subramanian, R. A Machine Learning Calibration Model Using Random Forests to Improve Sensor Performance for Lower-Cost Air Quality Monitoring. Atmos. Meas. Tech. 2018, 11, 291–313. [Google Scholar] [CrossRef]
  40. Liang, L. Calibrating Low-Cost Sensors for Ambient Air Monitoring: Techniques, Trends, and Challenges. Environ. Res. 2021, 197, 111163. [Google Scholar] [CrossRef]
  41. Villanueva, E.; Espezua, S.; Castelar, G.; Diaz, K.; Ingaroca, E. Smart Multi-Sensor Calibration of Low-Cost Particulate Matter Monitors. Sensors 2023, 23, 3776. [Google Scholar] [CrossRef]
  42. Wang, A.; Machida, Y.; deSouza, P.; Mora, S.; Duhl, T.; Hudda, N.; Durant, J.L.; Duarte, F.; Ratti, C. Leveraging Machine Learning Algorithms to Advance Low-Cost Air Sensor Calibration in Stationary and Mobile Settings. Atmos. Environ. 2023, 301, 119692. [Google Scholar] [CrossRef]
  43. Li, J.; Zhang, H.; Chao, C.-Y.; Chien, C.-H.; Wu, C.-Y.; Luo, C.H.; Chen, L.-J.; Biswas, P. Integrating Low-Cost Air Quality Sensor Networks with Fixed and Satellite Monitoring Systems to Study Ground-Level PM2.5. Atmos. Environ. 2020, 223, 117293. [Google Scholar] [CrossRef]
  44. Kramer, A.L.; Liu, J.; Li, L.; Connolly, R.; Barbato, M.; Zhu, Y. Environmental Justice Analysis of Wildfire-Related PM2.5 Exposure Using Low-Cost Sensors in California. Sci. Total Environ. 2023, 856, 159218. [Google Scholar] [CrossRef]
  45. Stampfer, O.; Zuidema, C.; Allen, R.W.; Fox, J.; Sampson, P.; Seto, E.; Karr, C.J. Practical Considerations for Using Low-Cost Sensors to Assess Wildfire Smoke Exposure in School and Childcare Settings. J. Expo. Sci. Env. Epidemiol. 2024, 35, 157–168. [Google Scholar] [CrossRef]
  46. Liu, H.; Wang, B.; Sun, X.; Li, T.; Liu, Q.; Guo, Y. DCSCS: A Novel Approach to Improve Data Accuracy for Low Cost Meteorological Sensor Networks. Inf. Technol. J. 2014, 13, 1640–1647. [Google Scholar] [CrossRef]
  47. Grykałowska, A.; Kowal, A.; Szmyrka-Grzebyk, A. The Basics of Calibration Procedure and Estimation of Uncertainty Budget for Meteorological Temperature Sensors. Meteorol. Appl. 2015, 22, 867–872. [Google Scholar] [CrossRef]
  48. Sun, X.; Yan, S.; Wang, B.; Xia, L.; Liu, Q.; Zhang, H. Air Temperature Error Correction Based on Solar Radiation in an Economical Meteorological Wireless Sensor Network. Sensors 2015, 15, 18114–18139. [Google Scholar] [CrossRef]
  49. Tang, R.; Ning, Y.; Li, C.; Feng, W.; Chen, Y.; Xie, X. Numerical Forecast Correction of Temperature and Wind Using a Single-Station Single-Time Spatial LightGBM Method. Sensors 2022, 22, 193. [Google Scholar] [CrossRef]
  50. Cao, Q.; Wu, Y.; Yang, J.; Yin, J. Greenhouse Temperature Prediction Based on Time-Series Features and LightGBM. Appl. Sci. 2023, 13, 1610. [Google Scholar] [CrossRef]
  51. Nan, F.; Zeng, C.; Ni, G.; Zhou, M.; Shen, H. Development and Validation of Low-Cost IoT Environmental Sensors: A Case Study in Wuhan, China. IEEE Sens. J. 2023, 23, 3069–3078. [Google Scholar] [CrossRef]
  52. Ke, G.; Meng, Q.; Finley, T.; Wang, T.; Chen, W.; Ma, W.; Ye, Q.; Liu, T.-Y. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Figure 1. (a) Integrated circuit board; (b) low-cost integrated sensor.
Figure 1. (a) Integrated circuit board; (b) low-cost integrated sensor.
Sensors 25 03398 g001
Figure 2. Location map of nine experimental sites in Wuhan, with surrounding scenes of each low-cost sensor deployment.
Figure 2. Location map of nine experimental sites in Wuhan, with surrounding scenes of each low-cost sensor deployment.
Sensors 25 03398 g002
Figure 3. Structure of data processing and calibration.
Figure 3. Structure of data processing and calibration.
Sensors 25 03398 g003
Figure 4. Comparison of air temperature data from nine low-cost sensors with nearby standard weather stations before calibration.
Figure 4. Comparison of air temperature data from nine low-cost sensors with nearby standard weather stations before calibration.
Sensors 25 03398 g004
Figure 5. MAE between hourly air temperatures across four seasons, comparing data from the standard weather stations, low-cost sensors (original), MLR-based calibration, and LightGBM-based calibration.
Figure 5. MAE between hourly air temperatures across four seasons, comparing data from the standard weather stations, low-cost sensors (original), MLR-based calibration, and LightGBM-based calibration.
Sensors 25 03398 g005
Figure 6. Temporal variation of hourly air temperature data from several consecutive days, randomly selected for low-cost sensors before calibration (Sensor), standard weather stations (Station), and low-cost sensors after LightGBM calibration (LightGBM) across four seasons: (a) spring, (b) summer, (c) autumn, and (d) winter. Site No. 5 is used randomly as a demonstration.
Figure 6. Temporal variation of hourly air temperature data from several consecutive days, randomly selected for low-cost sensors before calibration (Sensor), standard weather stations (Station), and low-cost sensors after LightGBM calibration (LightGBM) across four seasons: (a) spring, (b) summer, (c) autumn, and (d) winter. Site No. 5 is used randomly as a demonstration.
Sensors 25 03398 g006
Figure 7. (a) R2, (b) MAE, and (c) RMSE values for annual hourly air temperature data from standard weather stations and low-cost sensors (original) after calibration using MLR and LightGBM.
Figure 7. (a) R2, (b) MAE, and (c) RMSE values for annual hourly air temperature data from standard weather stations and low-cost sensors (original) after calibration using MLR and LightGBM.
Sensors 25 03398 g007
Figure 8. Box plots of hourly air temperature differences between standard weather stations and low-cost sensors across different months. Comparison results of (a) No. 2 (located in the south of Figure 2); (b) No. 3 (located in the middle and east of Figure 2); (c) No. 4 (located in the north of Figure 2).
Figure 8. Box plots of hourly air temperature differences between standard weather stations and low-cost sensors across different months. Comparison results of (a) No. 2 (located in the south of Figure 2); (b) No. 3 (located in the middle and east of Figure 2); (c) No. 4 (located in the north of Figure 2).
Sensors 25 03398 g008
Figure 9. Performance of air temperature data from nine low-cost sensors and standard weather stations (a) before and (b) after calibration using the LightGBM method.
Figure 9. Performance of air temperature data from nine low-cost sensors and standard weather stations (a) before and (b) after calibration using the LightGBM method.
Sensors 25 03398 g009
Figure 10. Comparison of raw air temperature data from low-cost sensors (sensor), reference data from standard weather stations (station), and LightGBM-calibrated data (LightGBM) at four randomly selected sites. The long-term daily averages over a 12-month period are displayed.
Figure 10. Comparison of raw air temperature data from low-cost sensors (sensor), reference data from standard weather stations (station), and LightGBM-calibrated data (LightGBM) at four randomly selected sites. The long-term daily averages over a 12-month period are displayed.
Sensors 25 03398 g010
Table 1. Details of low-cost sensor components.
Table 1. Details of low-cost sensor components.
TypeManufacturersModelPrecision
TemperatureSilicon Labs
(Austin, TX, USA)
Si705x±0.1 °C
HumiditySensirion
(Stäfa, Zurich Canton, Switzerland)
SHT35±1.5% RH
PressureBosch
(Gerlingen, Baden-Württemberg,
Germany)
BMP280±0.12 hPa
Table 2. Standard weather station names, low-cost sensor IDs, and surface types for nine sites.
Table 2. Standard weather station names, low-cost sensor IDs, and surface types for nine sites.
No.StationSensorSurface Type
1XJS005Woodlands
2SXY019Grasslands
3LZJ021Cultivated land
4HPYQ139Cultivated land
5HPDT166Cultivated land
6CZJ175Woodlands
7HPSK249Built-up areas
8CXL283Shrubland
9WJH286Shrubland
Table 3. Statistics and comparison of raw air temperature data from low-cost sensors and standard weather stations. Results are in degrees Celsius.
Table 3. Statistics and comparison of raw air temperature data from low-cost sensors and standard weather stations. Results are in degrees Celsius.
No.TypeMinMaxMedianMean
1Sensor−3.526 49.005 18.021 18.513
Station−3.000 41.000 18.200 18.091
2Sensor−5.146 45.088 17.531 17.463
Station−4.900 40.900 17.800 17.635
3Sensor−2.650 46.754 19.047 19.543
Station−2.900 40.900 19.000 18.814
4Sensor−5.052 46.685 16.658 17.036
Station−5.400 40.400 17.150 16.995
5Sensor−3.827 49.813 18.364 18.934
Station−3.500 40.700 18.300 18.173
6Sensor−4.153 48.334 18.510 18.416
Station−2.800 40.400 18.600 18.401
7Sensor−2.312 49.289 20.045 20.453
Station−1.900 41.500 19.700 19.540
8Sensor−5.346 46.116 17.357 17.812
Station−4.900 40.900 17.500 17.488
9Sensor−3.439 48.572 17.685 18.350
Station−3.200 40.800 18.500 18.262
Table 4. Statistics and comparison of the effects of multiple scenes and distances on calibration performance.
Table 4. Statistics and comparison of the effects of multiple scenes and distances on calibration performance.
CalibratedModelR2 (LightGBM)Surface TypeDistance/km
No. 5 Sensor
Cultivated land
R2 (Original): 0.447
No. 70.936Built-up areas10
No. 30.956Cultivated land11
No. 90.950Shrubland18
No. 60.915Woodlands24
No. 10.942Woodlands29
No. 80.941Shrubland35
No. 20.915Grasslands48
No. 40.823Cultivated land50
No. 6 Sensor
Woodlands
R2 (Original): 0.489
No. 10.946Woodlands9
No. 90.933Shrubland10
No. 30.919Cultivated land19
No. 80.925Shrubland21
No. 50.896Cultivated land24
No. 70.898Built-up areas34
No. 40.917Cultivated land39
No. 20.979Grasslands72
No. 8 Sensor
Shrubland
R2 (Original): 0.416
No. 10.935Woodlands13
No. 90.937Shrubland17
No. 40.917Cultivated land18
No. 60.934Woodlands21
No. 50.929Cultivated land35
No. 30.916Cultivated land36
No. 70.888Built-up areas43
No. 20.934Grasslands75
Table 5. The sample size after dividing the air temperature value interval and the index evaluation results before and after calibration.
Table 5. The sample size after dividing the air temperature value interval and the index evaluation results before and after calibration.
Value
Range/°C
Sample
Size
OriginalLightGBM-Calibrated
R2MAERMSER2MAERMSE
[−6, 2]4929−1.246 4.724 6.159 0.785 1.503 1.905
[2, 10]13,3540.110 3.884 5.454 0.874 1.569 2.051
[10, 18]18,578−0.015 5.134 6.672 0.873 1.819 2.357
[18, 26]18,0290.116 5.393 6.926 0.906 1.752 2.257
[26, 34]13,633−0.224 5.705 7.656 0.904 1.656 2.147
[34, 42]5285−2.389 9.512 11.113 0.877 1.636 2.114
[42, 50]1003−6.653 12.691 13.698 0.839 1.530 1.984
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Nan, F.; Zeng, C.; Shen, H.; Lin, L. Calibration of Integrated Low-Cost Environmental Sensors for Urban Air Temperature Based on Machine Learning. Sensors 2025, 25, 3398. https://doi.org/10.3390/s25113398

AMA Style

Nan F, Zeng C, Shen H, Lin L. Calibration of Integrated Low-Cost Environmental Sensors for Urban Air Temperature Based on Machine Learning. Sensors. 2025; 25(11):3398. https://doi.org/10.3390/s25113398

Chicago/Turabian Style

Nan, Fang, Chao Zeng, Huanfeng Shen, and Liupeng Lin. 2025. "Calibration of Integrated Low-Cost Environmental Sensors for Urban Air Temperature Based on Machine Learning" Sensors 25, no. 11: 3398. https://doi.org/10.3390/s25113398

APA Style

Nan, F., Zeng, C., Shen, H., & Lin, L. (2025). Calibration of Integrated Low-Cost Environmental Sensors for Urban Air Temperature Based on Machine Learning. Sensors, 25(11), 3398. https://doi.org/10.3390/s25113398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop