1. Introduction
High-quality meteorological data with a high horizontal resolution in urban areas are essential for user-specific application services for various weather phenomena, such as flash floods, heat waves, strong winds, drought, and road ice [
1,
2,
3,
4,
5]. Urban building blocks have a 10 m horizontal scale and 1 min temporal scale inhomogeneity of surface temperature [
2,
6]. High-resolution and low-cost sensor networks are being used in many megacities worldwide to compensate for the low horizontal resolution of nationwide meteorological observation network [
3]. These networks include meteorological sensors as well as air quality, noise, and vibration sensors to deliver useful information to the citizen. The “Array of Things” project developed in Chicago is an example [
4].
National meteorological and air quality data have been maintained and operated by national meteorological administrations (e.g., the National Weather Service in the U.S. and Korea Meteorological Administration in Korea) and national environmental administrations (e.g., the Environmental Protection Agency in the U.S. and Ministry of Environment in Korea), respectively. The automated synoptic observing system (ASOS) and automatic weather system (AWS), controlled by the World Meteorological Organization (WMO), are superior in data quality, and quality control is well organized [
7]. Automatic sampled and transmitted data have errors not only in the sensor itself but also in electric connections and telecommunication [
5]. Sensor errors fall into the following five categories: (1) errors due to the failure of some system component, (2) calibration errors, (3) sensor drift in gain or bias-type errors, (4) exposure errors, and (5) system noise [
8]. Electric and telecommunication errors can be random or systematic.
Quality management provides the principles and methodological framework for the operation and coordinates activities regarding quality [
7]. Quality control is associated with components that are used to ensure that the quality requirements are met, and it includes all operational techniques and activities used to fulfill the quality requirements. Quality management aims to ensure that data meet the requirements for uncertainty, resolution, continuity, homogeneity, representativeness, timeliness, and format for the intended application at a minimum practical cost.
Most meteorological observation networks have their own quality control systems. For example, Nordic countries have adapted station QC (QC0), real-time QC (QC1), non-real-time QC (QC), and human QC (HQC) [
9]. Oklahoma mesonet has adapted climate range, step, persistence, spatial, and like-instrument tests [
10,
11]. Moreover, the Korea Meteorological Administration (KMA) developed a quality management system (QMS) for meteorological data [
12]. All meteorological data were collected into the Combined Meteorological Information System (COMIS), and their quality was assessed and controlled using a real-time quality control system for meteorological observation data (RQMOD) developed by the KMA, which consists of physical limit, climate range, step test, persistence check, internal consistency, and median filter [
13].
Recently, many large cities have been increasing their Internet of Things (IoT) sensor networks to achieve higher horizontal resolution. Seoul, South Korea, established and expanded an IoT sensor network, called the Smart Seoul Data of Things (S-DoT) [
2]. The S-DoT aimed not only to assist municipal policies related to living quality, such as air quality, noise, and odor but also to deliver useful information including meteorological variables to their citizens. Most IoT networks use micro-electro-mechanical system (MEMS) sensors, which are much cheaper than the high-quality and expensive sensors used by national meteorological observation networks. Additionally, most sensors are installed over unsuitable observational environments between urban obstacles, such as buildings. Thus, IoT sensor networks require a more rigorous QMS than conventional meteorological observation networks. Zhang et al. addressed the importance of data quality in IoT networks [
14]. Missing and incorrect values in these networks should be removed or replaced with correct values based on quality control [
15].
This study aimed to develop a QMS for an IoT meteorological sensor installed in a non-ideal environment and apply it to S-DoT meteorological sensors in Seoul, South Korea. The air temperatures obtained by the S-DoT were compared with those obtained by the ASOS and AWS operated by the KMA on heatwave and coldwave event days. The QMS for S-DoT meteorological sensors (QMS-SDM) was applied to the S-DoT data from August 2020 to July 2021. This study highlights the application of QMS-SDM.
3. Climatology on Heatwave and Coldwave Event Days
The threshold values for climatic range tests should be determined for quality control. The climatology of heatwave and coldwave event days was investigated. Heatwave and coldwave events are one of the synoptic-scale weather phenomena. However, urban areas tend to show higher temperatures during a heatwave period due to extra heat storage in and release from materials such as concrete and asphalt [
6,
16]. The spatiotemporal distributions of temperature obtained using the S-DoT IoT sensors were compared with those obtained from KMA-operated ASOS.
3.1. Heatwave Event Day (24 July 2021)
A heatwave event day was selected as 24 July 2021, which was the hottest day in Seoul in 2021. The average, maximum, and minimum temperatures in the Seoul ASOS station were recorded as 31.7, 36.5, and 26.7 °C, respectively.
Figure 2 shows the horizontal distribution of the daily mean, maximum, and minimum temperatures on the heatwave event day. The spatially averaged daily mean temperature was 33.1 °C, which was higher than that recorded by the ASOS by 1.4 °C (
Figure 2a). Further, 89% of the S-DoT stations showed a higher daily mean temperature than the ASOS, whereas only 11% showed lower daily mean values. The spatially averaged maximum temperature was 39.3 °C, which was higher by 2.8 °C than that recorded by the Seoul ASOS (
Figure 2b). Moreover, 97% of the S-DoTs exhibited a higher maximum temperature than that recorded by the ASOS. The spatial average of the daily minimum temperature was 28.3 °C, which was higher than ASOS temperature by 1.6 °C. Notably, the temperatures in central Seoul areas were much higher than those in the northern, southernmost, and eastern areas.
The highest value of the daily maximum temperature was recorded at the Doksan Library Station, which is located near mountainous regions, although temperatures near mountainous regions were generally lower than those in the surrounding areas (
Figure 3c). The second- and third-highest daily maximum temperatures were recorded at the Changsin and Myeongryun Stations in Jongro-gu District (
Figure 2b and
Figure 3a,b). The sensor box of the above stations was attached to the west-facing or south-facing wall (
Figure 3). Thus, the sensors installed on the wall could not represent the surrounding areas. Solar radiation heats up the wall directly, and the heated wall could affect the sensor through conduction [
6,
16].
The stations can be clustered according to temporal variation patterns. In this study, the time series of all stations were classified into 5 clusters using the dynamic time warping (DTW) clustering technique (
Figure 4) [
17,
18].
Figure 5 shows the time series of centroid temperatures for each cluster. Cluster 1, which had stations in the urban center, showed normal temperatures during the day and higher temperatures at night. Cluster 2, which had stations near the mountainous areas, showed much lower temperatures at night and slightly lower temperatures during the day. Cluster 3 was similar to Cluster 1, but the former exhibited slightly higher temperatures during the day and similar temperatures at night. The distribution of Cluster 3 was also similar to that of Cluster 1, but Cluster 3 was concentrated in the western part of Seoul, whereas Cluster 1 was evenly scattered throughout downtown Seoul. Further, Cluster 4 exhibited a slightly lower temperature at night, and was located between Clusters 1 and 2, and Cluster 5, which had stations in the center of the downtown areas, showed much higher temperatures during both the day and night. Cluster analyses implied that Seoul city could be classified into several climate zones according to the location of the station.
3.2. Coldwave Event Day (8 January 2021)
A coldwave event day was selected as 8 January 2021, which was the coldest day in Seoul in 2021. The average, maximum, and minimum temperatures in the Seoul ASOS station were recorded as −14.9, −10.7, and −18.6 °C, respectively.
Figure 6 shows the horizontal distribution of the daily mean, maximum, and minimum temperatures on the coldwave event day. The spatially averaged daily mean temperature was −12.9 °C, which was higher by 2.0 °C than that of the ASOS (
Figure 6a). Further, 91% of the S-DoT stations showed a higher daily mean temperature than the temperature recorded at the ASOS, whereas only 9% showed lower mean values. The spatially averaged maximum temperature was −8.4 °C, which was higher by 2.3 °C than that recorded by the Seoul ASOS (
Figure 6b). Moreover, 91% of the S-DoT stations exhibited a higher maximum temperature than the ASOS. The spatial average of the minimum temperature was −16.2 °C, which was higher by 1.6 °C than that recorded by the ASOS. Furthermore, 95% of the stations showed a higher minimum temperature than the ASOS, whereas only 5% showed a lower minimum temperature (
Figure 6c).
The highest value among daily maximum temperatures was recorded as 5.8 °C at Changsin Station, the second hottest station during the heatwave event day (
Figure 3a). A stovepipe exited heat energy 1 m apart on the same wall. The wall heated by direct solar radiation and heat energy from the stovepipe exit may have increased the sensor temperature abruptly. The sensor boxes at stations recorded as the top 3 maximum temperatures were all attached to walls. It can be concluded that the stations whose sensor boxes were attached to a wall of a building exhibited a much higher temperature than the other stations, implying that these could not represent the surrounding local climate zones. Thus, these stations should be removed before the basic quality control.
4. Quality Management System for S-DoT Meteorological Sensors (QMS-SDM)
QMS-SDM has pre-processing (Q0), basic quality control for a single station in real-time (Q1), extended quality control for multiple stations near real-time (Q2), and spatiotemporal gap-filling for multiple stations for daily data (Q3) steps (
Figure 7). Each step had its own flag rule.
The stations installed on walls were removed before the QMS-SDM because the data did not represent the surrounding local climate zones, as mentioned in
Section 3 (
Figure 3). The first flag (A) addressed whether the observation environment was good or bad (QC00). If there were two first flags, the remaining flags became meaningless.
4.1. Pre-Processing (QC0)
Pre-processing has two steps: time allocation (QC01) and filling short missing data (QC02). The QC01 step allocates irregular sampling time to regular observation time every 2 min. First, the observation time was set to every 2 min from midnight; subsequently, the sampling time was allocated to the latest observation time in the backward direction.
Short-time data were missing owing to electrical problems or incomplete communication. Missing data shorter than 6 min (three data points) were filled using a linear regression equation with the most recent five data points. The autocorrelation function at 6 min is above 0.98 for air temperature and relative humidity.
4.2. Basic Quality Control for A Single Station in Real-Time (QC1)
The basic quality control for a single station in real-time has five quality checks: (1) physical limit check (QC11), (2) climate range check (QC12), (3) internal consistency check (QC13), (4) persistence check (QC14), and (5) step check (QC15). The physical limit check (QC11) excludes data with physically impossible values (WMO, 2002), whereas the climate range check (QC12) excludes climatological outlier values. Moreover, the upper (PLu and CRu) and lower (PLl and CRl) threshold values for physical limit and climate range checks should be determined in advance. The internal consistency check (QC13) detects and excludes data inconsistent with other data (WMO, 2002). The persistence check (QC14) detected data that do not vary with time. The most persistent data occurred because of electrical problems. The step check (QC15) detected data that varied abruptly. Moreover, the threshold values for QC14 and QC15 (PC and SC, respectively) were determined in advance.
Table 3 shows the lower and upper threshold values for physical limit checks applied to the ASOS/AWS operated by the KMA and AWS operated by local governments. The KMA and local governments used the same threshold values for relative humidity, wind speed, and direction. The AWS (local governments) applied −45 °C to the lower threshold values for air temperature, whereas the ASOS/AWS (KMA) applied −35 °C. The S-DoT sensors showed different lower and upper limits. The lower and upper limits were set to be −40 °C and 80 °C for air temperature, 0% and 100% for relative humidity, 0 m s
−1 and 60 m s
−1 for wind speed, 0° and 360° for wind direction, respectively (
Table 3).
The lower and upper limits for climate range checks were determined according to the monthly climatology (
Table 4) [
19]. The KMA determined the lower and upper limits as the mean minus and plus
n times the standard deviation, where
n was 3–9 [
9,
10,
19,
20].
Most S-DoT sensors were installed downtown in Seoul. As a result, in
Section 3, the upper temperature limits of the QMS-SDM were set higher than that of KMA by 7 °C (histograms in
Figure 3 and
Figure 6). The lower temperature limits of the QMS-SDM were set to be the same as those of the KMA.
The duration for the persistent test was set to 180 and 240 min for temperature and wind, respectively, both in the KMA and local governments (
Table 5). As the duration already represents the longest time for the persistence check, the QMS-SDM used the same duration for the persistence test.
The allowable abrupt change for 1 min was set to 3 °C for temperature, 10% for relative humidity, and 10 m s
−1 for wind speed in the KMA and local governments. Although the sampling rate of the QMS-SDM was 2 min, twice that of the ASOS and AWS, the QMS-SDM was set to have the same threshold values for step checks for temperature, relative humidity, and wind speed (
Table 5).
4.3. Extended Quality Control for Multiple Stations near Real-Time (QC2)
The extended quality control for multiple stations near real-time comprises only one step: a spatial outlier check (QC21). The Madsen–Allerup method was applied to find spatial outlier data [
21,
22]. The median test statistic value
is defined as:
where
is the observation at station
i at time
t,
is the median value of the N observation at time
t, and
and
are the 25% and 75% quantile values of the N observations, respectively. If
, the data becomes unreliable. The air temperature was modified as a height-adjusted temperature considering the lapse rate of dry air temperature (9.8 °C km
−1) before applying this step [
23].
4.4. Data Reconstruction Using Spatiotemporal Gap-Filling for Daily Data (QC3)
QC3 comprised spatial gap filling (QC31) and temporal gap filling (QC32). The data removed during the QC22 step were reconstructed by averaging the values at the nearest three stations within 2 km. If only one or two stations were available, the values were replaced with the average values of one or two other stations. QC31 was applied to air temperature, relative humidity, and wind speed.
Temporal gap filling was applied to the data with missing periods of less than 30 min (15 data points). Further, the Stineman method was applied to fill in the missing data [
24], and this method shows better performance for data with sudden slope changes and volatility clustering [
25]. The method was selected based on the results of the performance test. The performance with respect to the imputation methods is provided in
Tables S1 and S2.
4.5. Flag Rules
Each step has its own flag. In total, there were 10-digit flags. Character 0 represented the status “normal”, while Character 1 or 2 represented “not normal”. Flag 1 in Steps B, I, and J implied filled data, whereas Flag 1 in Steps E, F, G, and H implied doubtful or unreliable data. Flag 2 in Step A implied a bad observation environment, and Flag 2 in Steps C and D implied erroneous data (
Table 6).
6. Summary and Conclusions
A quality management system for an IoT meteorological sensor network was developed. The QMS was applied to Seoul S-DoT, one of the largest and most diverse big data globally. The S-DoT sensors were installed over realistic urban areas, such as downtown areas, urban parks, and river sides, and not on ideal surfaces. The local climate zones near IoT network stations differed completely from those operated by the National Meteorological Administration [
26,
27]. Owing to irregular data transmission, the observation time was not regular. Some missing data were recorded as zero, which could not be distinguished from an observed zero value. Before the main quality control, a pre-processing step was added to unify the format.
The horizontal and temporal variations in temperature observed in the S-DoT network during heatwave and coldwave event days were investigated. Diurnal variation patterns observed in the S-DoT network were coherent with that obtained by the synoptic-scale ASOS station. The average, maximum, and minimum temperatures from the S-DoT were found to be higher than those from the ASOS on both event days. The temperatures at a few S-DoT stations were much higher than those at other stations because of the surrounding heating sources, such as walls. As these data could not represent surrounding local climate zones, they should be removed before the main QC.
Diurnal variation in temperature was classified into five clusters using DTW clustering. In general, stations with high temperatures during the daytime were located in the center of Seoul, whereas stations with low temperatures at night were located in forest areas near the city boundaries. Clustering analyses indicated that the upper and lower threshold values for climate range checks can be determined with respect to the cluster.
The QMS-SDM was designed to include two pre-processes, five basic quality controls, two extended quality controls, and two data reconstructions using gap filling. Quality control methodology and threshold values were defined based on previous studies and the present study. Normal, doubtful, and erroneous information for each QC step was saved as the corresponding flag file.
The QMS-SDM was applied to the S-DoT meteorological data from August 2020 to July 2021. Available data increased by 20% using the temporal imputation because the missing pattern was completely random. Using the QMS-SDM, data with irregular and diverse formats were changed to a regular and unified format. Furthermore, QC data are expected to provide high-quality and high-resolution urban meteorological information services effectively.
Moreover, the QMS-SDM can be applied to different IoT meteorological sensor networks. Pre-processing depends on the IoT sensor networks and should be reorganized. If the time interval is constant, then the synchronicity process may be unnecessary. Short missing data can be imputed using linear regression or the Stineman method. Furthermore, basic QC is essential for all networks. However, the threshold values should be redefined to match the sensor and climatology. Spatial outlier detection and spatial gap-filling may be useful for most networks.