GIS-Based Approach to Spatio-Temporal Interpolation of Atmospheric CO 2 Concentrations in Limited Monitoring Dataset

: Understanding the magnitude and distribution of the mixes of the near-ground carbon dioxide (CO 2 ) components spatially (related to the surface characteristics) and temporally (over seasonal timescales) is critical to evaluating present and future climate impacts. Thus, the application of in situ measurement approaches, combined with the spatial interpolation methods, will help to explore variations in source contribution to the total CO 2 mixing ratios in the urban atmosphere. This study presents the spatial characteristic and temporal trend of atmospheric CO 2 levels observed within the city of Wroclaw, Poland for the July 2017–August 2018 period. The seasonal variability of atmospheric CO 2 around the city was directly measured at the selected sites using flask sampling with a Picarro G2201-I Cavity Ring-Down Spectroscopy (CRDS) technique. The current work aimed at determining the accuracy of the interpolation techniques and adjusting the interpolation parameters for estimating the magnitude of CO 2 time series/seasonal variability in terms of limited observations during the vegetation and non-vegetation periods. The objective was to evaluate how different interpolation methods will affect the assessment of air pollutant levels in the urban environment and identify the optimal sampling strategy. The study discusses the schemes for optimization of the interpolation results that may be adopted in areas where no observations are available, which is based on the kriging error predictions for an appropriate spatial density of measurement locations. Finally, the interpolation results were extended regarding the average prediction bias by exploring additional experimental configurations and introducing the limitation of the future sampling strategy on the seasonal representation of the CO 2 levels in the urban area. samples of at 6 experimental Three were established over the biome most This study used datasets derived and measured at the nearby two automatic for the Chief Inspectorate of showing variations in CO:CO 2 Moreover, measurements of CO 2 were performed from the laboratory station” by a network close to the at the (NB) interpolation, interpolation based on a triangulated irregular network (TIN), Voronoi polygons, ordinary kriging) were compared in the present study for their ability to accurately produce the interpolated surface of atmospheric CO 2 concentrations. Each interpolation method was tested for effectiveness in determining the CO 2 distribution in the city of Wroclaw in two different periods (vegetation vs. heating seasons) from July 2017 to August 2018. The results obtained demonstrate that the acceptable quantitative and ac-Atmosphere


Introduction
Atmospheric greenhouse gas (GHG) mixing ratio is strongly influenced by multiple human-induced and natural factors. The rising CO2 concentration in the atmosphere is a direct consequence of human activities, including a contribution of fossil fuel-derived emissions into the atmosphere and effects of land-use changes, etc. [1][2][3]. In addition, natural impacts through gas fluxes in ecosystems, carbon uptake and release from soils and sequestration by live vegetation, are also an important part of the terrestrial carbon cycle [4][5][6][7]. Evaluation of the urban carbon cycle, due to the mix of anthropogenic and natural emission effects, is often difficult and shows a greater number of potential uncertainties than in a wider spatial resolution. The accuracy of air quality studies depends on the precision of criteria that are estimated for a certain region meteorological fields and category of emission origins. At the present, the concentration of GHGs is mainly estimated by the measurements from nearest stations or single discrete observations. In practice, ground-based observations include a small number of sampling points in a limited spatial resolution. Datasets gathered in air pollution monitoring are often not complete in certain locations and contain missing value points. As a result, very little is known about the spatio-temporal variation of GHGs at an urban scale.
Given a large amount of daily CO2 variations in the urban environment, the attempt to spatial predictions of air pollution is appropriate [4,5,8]. The overall spatial patterns and the temporal evolution of the average concentration fields (pollution maps) derived from the field measurements with various weight factors (e.g., various meteorological conditions, terrain elevation, etc.) can provide realistic information on air quality in the unmeasured locations. Time information about the typical spatial patterns should also be noted to adequately represent the temporal effects of emission drivers (given emission sources, traffic intensity). The seasonal average CO2 concentration map in particular is assumed to be reasonable to underline the temporal processes of the fine-scale spatial variability of GHG reduction targets in the urban areas.
The problem of filling data gaps in monitoring measure series can be solved by taking the pollutant distribution trend and interpolating the missing data from surrounding values. Building geostatistical spatial prediction models is a tool for achieving higher accuracy in detecting patterns of the transport and distribution of the gaseous compound through the local atmosphere. When a sufficient input measurement dataset is available, the precise predictions of temporal and spatial variation in emission intensity are often undertaken through spatial interpolation techniques. High-performance interpolation is represented as explorative model analysis (comparing observed data to obtained under statistical model) with validation and geo-visualization of estimated values.
Due to the sampling design and/or missing data, there is a need to evaluate the limitations to spatial interpolation tools to better representing gradients of air pollution in unsampled locations and understand the existence of spatial and temporal relationships between air quality and urban land-use patterns. When interpolating ambient air quality concentrations by a kriging scheme or other spatial interpolation techniques, there are a number of important features to be considered: the mechanism of the input datasets generated, spatial representativeness, temporal sampling frequency, spatial dependence of air pollution [9,10].
Geographical information systems (GIS) technology is a tool of great inherent potential for research in all areas of environmental science and engineering. The GIS system allows the dataset to be explored in different ways: description, explanation, prediction of patterns of spatial processes, and model building [11][12][13][14]. GIS spatial interpolation methods have been applied to many disciplines and different fields for decision making and problem-solving [15][16][17][18]. The GIS-based interpolation method is the process of using points with known values to estimate values at other unknown points distributed within an area [19,20]. For example, spatial interpolation can estimate the air quality data at locations without deploying the fine-scale number of monitoring stations or sufficiently detailed data by using known parameters from nearby measured points to cover the entire scale [21][22][23]. The application of the interpolation technique provides conclusions about the geographic differences in distributions of pollutants based on samples from various locations within the study area [24].
GIS is an intelligent and smart technology to evaluate present and future climate impacts. Spatial and temporal variations in GHGs from human activities with land-use changes related to the density of roads, settlements, industries, vegetation covers have been also analyzed in GIS [25][26][27]. A number of atmospheric and climate studies have been conducted to understand the magnitude and distribution of greenhouse gas emissions and trends spatially (land surface structure) and temporally (over diurnal and seasonal timescales) under environmental and weather-dependent dispersion conditions [28][29][30]. Predictions of air quality are often merged with modeling techniques like GIS geospatial meta-analyses [12,31]. Through meta-analysis methodology and spatial data derived from GIS, the long-term environmental analysis would be considered further at a large geographic scale [32]. Most of these investigations use the GIS spatial interpolation techniques as a method of analyzing the variability in the CO2, CH4, and CO species. In particular, the urban CO2 distribution maps based on the original measured and modeled data are often developed to help predict the future air quality patterns in the cities [9].
The current study provides detailed comparison of the performance of five interpolation techniques to visualize and analyse the spatial variability and temporal dynamics of atmospheric CO2 concentration, including inverse distance weighting (IDW), spline, natural neighbour (NB) interpolation, interpolation based on a triangulated irregular network (TIN), Voronoi polygons, and the ordinary kriging method. The interpolated surfaces were created using the ArcGIS software within the Spatial Analyst and the Geostatistical Analyst series of interpolation techniques. Finally, each method was tested with different parameters: weights of sample point structure, a number of measured points included, the size and direction of the search radius, spatial autocorrelation structure; the sensitivity of each method to the input data characteristics was also evaluated.

Description of the Study Area
The measurements of atmospheric CO2 concentrations were conducted during the vegetation and heating seasons, from July 2017 to August 2018, within the area of Wroclaw, the fourth-largest city in Poland. Wroclaw covers an area of 292.92 km 2 ; city lies between latitudes 51°2′35″ N and 51°12′38″ N and longitudes 16°48′26″ E and 17°10′34″ E. According to data from the Central Statistical Office, there were about 640,000 inhabitants in Wroclaw in 2018 [33]. The topography of the city is characterized by a flat surface to gently rolling hills in the periphery. The terrain is higher in the western part, defined by an average absolute elevation of 148 m a.s.l. and low in the southeast in the city center, approximately 120 m a.s.l. Wroclaw is situated in the basin of the Oder River that flows in many channels and waterways forming the drainage network of a stream system [34].
The city has temperate continental climatic conditions, with a mean annual precipitation of 567 mm. The annual average temperature of Wroclaw is 10.2 °C (mean by decade 2008-2019), with the coldest month January (−0.1 °C), and the warmest July (20.2 °C) [35]. Warm and humid weather conditions are common in the summer; the winter season is relatively mild with moderate and changeable temperatures [36]. Prevailing wind directions are typically west and south with an average speed of about 2.5 m·s −1 , according to records collected by the meteorological station located at the Copernicus Airport Wroclaw [37]. Air temperature and amount of precipitation for the city of Wroclaw recorded by the meteorological station located at the Copernicus Airport Wroclaw between July 2017 and August 2018 are shown in Figure 1 [38]. Wroclaw's atmospheric pollution issues result mainly from two types of anthropogenic sources, caused by emissions from local household heating and vehicular traffic, as well as from large sources of emissions, such as power and industrial plants, nearby. The two combined heat and power (CHP) stations situated in Wroclaw (CHP "Wroclaw") and neighboring CHP "Czechnica" are most likely dominant origins of atmospheric carbon dioxide [39]. The local domestic heating, due to the use of poor-quality fossil fuels (coal, oil energy sources), are also main type of air pollution source [8]. Transport operation in the city is a rapidly growing source of energy-related carbon emissions, which constantly contribute to air quality issues in the surrounding areas.

Experimental Set-Up
In this study, the measured atmospheric CO2 levels were collected between 1 p.m.-5 p.m. local time (when the near-ground atmosphere was well mixed) by using the grab method (flask sampling) during a one-year period from July 2017 to August 2018 on a weekly basis. The air samples were taken into 1-L PTFE bags by using a vacuum pump at about 2 L/min of flow rate passing through a magnesium perchlorate glass water trap. The collected air samples were determined within 24 h using the Picarro CRDS (cavity ring-down spectroscopy) technique with model G2201-I [40]. In order to ensure the accuracy of the measurement results, instrumental calibrations were conducted on the high-accuracy gas cylinders (different levels of standard gases) using the linear calibration curve fits.
In the case of the limited observation network, the spatial interpolation analysis was covered at the district level in the east part of the city of Wroclaw ( Figure 2). The experimental plot in Wroclaw is particularly well suited for this project because it contains a complex mixture of anthropogenic and biogenic sources. The selected sites of the city of Wroclaw were chosen to represent the different degrees of anthropogenic influence on ecosystems in a city. The observations at these urban sites were combined with analysis of meteorological records, studies of local conditions, historical patterns of pollution rates and impacts of land-use changes. The aforementioned variables (meteorological data and topographic trends) were used in the optimization of interpolation results by considering anisotropy and interpolation parameters (e.g., neighbourhood type and number of sectors angle type, effective terrain weights, as specified in the ArcGIS Geostatistical Wizard extension), as well as in the analysis of the reflection of smooth/sharp changes on the interpolated maps.
Anisotropic semivariograms applied during the interpolation validation procedures were used to detected directional trends of seasonal CO2 concentrations by dominant wind routes. Moreover, the aforementioned procedures were undertaken to ensure that the high CO2 concentration events (interpolated surfaces with and without anisotropy) are dependent on the wind as a key meteorological factor that impacts the distribution of air pollutants is and also related to the weather events, e.g., anticyclonic circulations (lower wind speeds and calm conditions). Additionally, a variable search radius during cross-validation was determined based on a topography effect to cover the areas of emission spots (significant sources of GHGs and major roads).

Figure 2.
Location of study districts in the Wroclaw urban area. The measurement points are marked as: soil flux observation set/biogenic fluxes (violet pushpins), rooftop-level measurement sites (red points), location with on-site continuous diurnal series measurements (blue point); measurement stations operating by Inspectorate of Environmental Protection (dark red). The marks "RSP_1-6" indicate rooftop-level measurement sites, "FSP_1-3" soil flux sampling places, "CSt" means a location with continuous diurnal series measurements ("Cybulskiego station"), WIOS_1-2 are, respectively, measurement stations operating by Inspectorate of Environmental Protection. Base map source [41].
Several analyses have been performed in the investigation of CO2 emissions in the urban environment. The urban CO2 fluxes were estimated using a combination of different measurement approaches: the first one concerned the flask sampling (in gas sampling bags) of atmospheric CO2 mixing ratios, significantly influenced by fossil fuel component; the second one involved collecting samples for analysis of ambient CO2 concentration depending on the contribution of biospheric components (soil-surface CO2 fluxes and dynamics of photosynthesis) under various land use/land cover categories; the third one included continuous measurement of diurnal variations in CO2 concentrations.
The measurements of atmospheric CO2 were considered at three specifications: sampling points on the roofs; experimental sites that show natural emissions of CO2 within an urban location (in the vicinity of the soil respiration sampling plots); and measurements at the nearby automatic measurement stations of air pollutants for the Polish Chief Inspectorate of Environmental Protection. Following that, samples of ambient air were collected from the rooftop of the buildings at various altitudes (height of 15-30 m) at 6 experimental sites. Three plots were established over the area encompassing different biome types-the most typical urban ecosystems: grasslands, forests, arable fields. This study used datasets derived and measured at the nearby two automatic measurement stations of air pollutants for the Chief Inspectorate of Environmental Protection (in Polish, GIOŚ), showing variations in CO:CO2 ratios. Moreover, diurnal measurements of atmospheric CO2 concentration were performed from the laboratory building "Cybulskiego station" located in the city center and surrounded by a dense network of roads and close to the Oder River channel. The measurements at the "Cybulskiego station" were considered as independent check-point of urban CO2 concentrations in seasonal and diurnal cycles over a one-year period. The significant changes in concentration over short periods of time, often higher than the local background signals of CO2 in the heating season, were governed by local anthropogenic sources. Concentrations lower than the background CO2 time series usually reflect local sinks (biological CO2 fixation via plant photosynthesis) in the vegetation season.

Spatial Analysis and Modelling in the Geographical Information Systems (GIS)
The study investigates the relationships between field-determined seasonal CO2 concentration patterns and interpolated data at the specified sites across the study area, considering both related meteorological conditions and in-between the sampling points distance. The interpolated CO2 distribution is combined with a linear regression (LR) model of spatial and temporal trends of CO2 to estimate the impacts on a variance between time periods (for the vegetation and heating seasons) in response to terrain attributes and climatic variables of the area of interest. The approach is based on simple regressions that assume the relationship between observed mean CO2 concentration and effects of meteorological factors on both the temporal magnitude and spatial patterns of the carbon cycle. The resulting data (residuals of LR model) were then utilized to interpolate seasonal mean CO2 concentrations for the target locations and compared with original interpolator.
The observation size was examined by the distribution of CO2 on samples for the vegetation and heating periods. The Pearson correlation matrix was applied to estimate the relationships between the CO2 level and environmental variables (air temperature, relative humidity, atmospheric pressure, wind direction, and wind velocity) during the measurements. Multiple linear regression was also employed for each individual sampling site to examine the effect of meteorological factors on the seasonal cycle of CO2 in an urban area. The regression and correlation analysis performed using the least-squares method were conducted in the OriginPro 2019 software workspace.
The accuracy of observed and predicted CO2 distribution data within the study area is compared by applying GIS-based interpolation methods for point-sampled values by filling missing values in data sets of atmospheric CO2 monitoring. The interpolation analysis comprises hourly averages of atmospheric CO2 concentrations collected at the nine sites (roofs sampling points and flux sampling points) between July 2017 and August 2018; three sampling points (WIOS stations and Cybulskiego station) were used for verification purposes, as shown in Figure 2. The Tukey pairwise comparison between observations (t-tests: two-sample assuming) was used to evaluate the significance of differences in values of CO2 collected at each of the sampling points. A p-value (p = 0.05) was considered as statistically significant at measurement locations.
The analysis was carried out for the mean daily variation of CO2 levels in the study area. To capture the seasonal pattern of the CO2 concentration, the data attributed to each sampling location were further averaged. The averages for each measurement point were weighted in consideration of the site-specific data quality and completeness rate: missing observations were excluded from the dataset. The weighting factors used in the above calculation were geographic variables, such as topography, and land-uses of sampling sites (urban greenery represented by grasslands, forests, arable fields, e.g., "FSP_1-3" places) and emissions information, including the distance to the nearest major emission sources, and traffic intensity nearby (sites, located near heavy traffic roads, e.g., "RSP_1" and "RSP_6"), other observations with the pollutant of interest (e.g., "WIOS_1-2" measurement stations). Seasonal average CO2 concentrations taken at specific points were then interpolated. Predictions based on seasonal time scale were compared with monitoring data for the assessment of the spatial interpolation methodologies.
The different interpolation methods, such as inverse distance weighting (IDW), the natural neighbor approach, and interpolation based on a triangulated irregular network, Voronoi polygons were tested to analyze capabilities and select which method is best suited for fulfilling data gaps in CO2 measurements from selected nearby sampling locations. The various methods of interpolation work slightly differently, e.g., in a way to the shape of some mathematical function that is applied to the whole area of interest. IDW interpolation and spline are two deterministic methods that create surfaces from samples based on the extent of cell similarity or degree of smoothing [42]. However, a spline mathematical function crosses exactly through each sample point, while IDW crosses through none of the points [43]. In the IDW-based spatial interpolation method, the value at the unknown location (x, y), u(x, y) is calculated as the weighted average of the measurements at the monitoring sites, and the equation is (1) [9]: IDW is mostly used in pollution sampling and health science; kriging often used in soil science and geology, as well as in the field of atmospheric data analysis [12]. Spline interpolation is the best method to predict the smoothly varying surfaces of phenomena like temperature [44]. The interpolated value at the spline algorithm can be expressed as (2) [45,46]: where j = 1, 2, ..., N; N-the number of points; λj-coefficients found by the solution of a system of linear equations; rj-the distance from the point (x, y) to the jth point. There are two spline methods: regularized and tension. A regularized method creates a more elastic, gradually changing surface, while a tension spline usually creates smoother surfaces of the same sample points [47]. The tension curve is flatter than the regularized curve and forcing the estimates to stay closer to the sample data. The tension spline uses only first (slope) and second derivatives (rate of change in slope), when regularized includes the first second, and third derivative (rate of change in the second derivative) in the Spline computing [47]. Spline with a tension function is given by (3) [48]: where r-distance between the prediction point and the sample; σ-tension parameter; E1-exponential integral function; CE-constant of Eulero (0.577215); K0-modified Bessel function. The triangulated irregular network (TIN) interpolation typically used for high-precision modelling of smaller areas, e.g., in engineering areas [49]. Natural neighbour interpolation is based on the Voronoi tessellation of irregular distributed points, a network/region generated around each location in the data set can be written as (4) [19,50]: where si the natural neighbours of q and V(si) their associated regions in the original Voronoi diagram.
A geostatistical method, such as kriging, is one of the most complex interpolators and a powerful statistical technique based on sophisticated weighted average techniques for predicting values derived from the measure of relationship in samples [51]. The method considers both the distance and the degree of variation between sample points when estimating values of unknown samples. The distance and direction between all possible pairs of data points are calculated to provide data on the spatial autocorrelation of the sample point set on a particular surface [49]. The general mathematical formula for a kriging interpolation scheme using weights independent of the data is defined as (5) [52,53]: where Z(uα) is the random variable model at location uα; the uα's are the n data locations, the λα(u)'s are the ordinary kriging weights for estimation of Z×(u).
Choosing the correct type of interpolation method depends on many factors e.g., the nature of the observed variable and the representation period; types of surface for modelling; and the features of the geographic area [54]. The characteristic of pollutants affects the determination of the interpolation method to use. The difference in variance between different data sets provides information with regard to the level of data smoothing in the interpolated surfaces. For instance, the Thiessen and IDW techniques stay in the range of the original observations, while kriging and spline can exceed the observed range and produce interpolated values both below and above the measured ones [55,56].
The quality of the sampling set also affects the choice of the interpolation method. The kriging interpolation is more sophisticated than other methods, however, the limited number of sampling points, non-regular datasets, and clustered location of points could affect the quality of the interpolated outputs [57]. In the case of poorly distributed or few sampling points within the study area, then the application of the IDW method is better [58]. The spline interpolation does not work well if observation points are close together and have extreme differences in values [47].
Prediction accuracy of different interpolation methods is often evaluated by a cross-validation technique. The idea of cross-validation is removing one data point and performing the interpolation using the data at the remaining locations [47,59]. Furthermore, a calculation of residuals between the measured value of the removed point and its estimation is executed. The procedure is repeated until each observation in turn out of the sample has been estimated from the remaining observations [60]. After completing cross-validation, some data points can be rejected, such as that contains large errors or requires refitting the trend and autocorrelation models.
The spatial autocorrelation between points must be quantified through the semivariogram model: circular, spherical, exponential, Gaussian, linear [61]. When the semivariance is plotted against the lag distance or separation distance between points, the plot is called semivariogram [59]. The semivariance function characterizes the spatial continuity between points and estimates spatial dependence between observations. These spatial structures of semivariogram help in identifying autocorrelation, replicating samples and finding the optimal parameters of interpolation control parameters [61]. The model error for each interpolation in cross-validation is evaluated by statistical means, such as the mean error (ME) and root mean square error (RMSE) [60]. RMSE indicates an interpolator that calculates the most reliable estimates at an unsampled location and shows a dominant pattern in the data series [62].

Summary of the Measured CO2 Concentrations
The analysis in this study is based on the average hourly time series of atmospheric CO2 concentrations measured at the surface level. The observation period was divided into the vegetative and non-vegetative phases according to the meteorological data. The vegetative stage (vegetation I: 6 July 2017-7 November 2017 and vegetation II: 29 March 2018-2 August 2018) was defined as the period when the daily mean air temperature increases above +5 °C, whereas the heating stage (8 November 2017-28 March 2018), refers to the phase of low temperatures (drops below +5 °C) when the need for home heating becomes quite high. The measured patterns of atmospheric CO2 concentrations followed a normal distribution for both periods (vegetation and heating), whereas over the whole study period (vegetation I, heating, vegetation II seasons) data were not significantly right-skewed from a normally distributed population (tested with the Shapiro-Wilk test). The overall distribution of observed annual CO2, however, provides statistically acceptable fits to the set of observations, the data kurtosis of a normal population curve was not reported as tested with the Shapiro-Wilk test (p < 0.05) in OriginPro 2019 software.
The time-series of measured CO2 levels indicated clear seasonality as a result of natural processes and human activity. In the vegetative season, the biological impact through the photosynthetic carbon fixation is dominant while the anthropogenic impact is minimized. Conversely, in the heating season, the anthropogenic CO2 sources are predominant, while photosynthesis uptake of CO2 by cover vegetation is negligible. According to the seasonal cycle of carbon dioxide, for most of the measurement sites, a marked increase in CO2 concentrations was observed in late autumn (from November) caused by a decrease in air temperature and related to higher energy consumption for electricity and heating purposes. The decline in concentration was recorded in early spring (at the end of March and April) due to the longer duration of sunshine and coupled changes in the vegetation distribution (plant respiration and uptake). The measured average concentrations for atmospheric CO2 ranged from 380 to 440 ppm with a mean of 412 ± 3.1 ppm for the vegetation season; and the maximum concentrations for CO2 were observed in the heating season, in the range of 410-470 ppm with a mean of about 430 ± 3.8 ppm in winter ( Figure 3). Thus, the derived seasonal variation in atmospheric CO2 concentration reaches up to 25-30 ppm, which is consistent with the majority of previous studies [63][64][65][66]. The signals of measured CO2 for the study period also illustrate temporal variability in CO2 values over the sampling sites. Figure 4 shows the box plots of the median and interquartile range of CO2 concentrations at each sampling point. Analysis of variance (ANOVA) and a pairwise comparison Tukey test (p > 0.05) indicated a significant difference in the carbon dioxide rates observed for most of the study sites probably due to the distance between the sampling locations, and the influence of local sources (mainly transport activity and domestic heating sector). When analysing the differences separately for the vegetative and non-vegetative period, the seasonal pattern of CO2 concentrations was quite evident. At all measurement locations, atmospheric CO2 concentration was found to be higher in the heating period (from autumn to winter).
The compared means of atmospheric CO2 across the majority of the study sites were relatively close and do not demonstrate extreme values in the vegetation season ( Figure  4). A peak in CO2 concentration (median mean of 414 ppm) was only measured at the "WIOS_2" on-road traffic station. Although slightly higher concentrations were measured in the "RSP_1" and "RSP_6" sites, located near heavy traffic roads. Three stations ("RSP_3", "CSt", "WIOS_1") that were furthest from the heavily congested roads and associated with the urban greenery showed the lowest concentrations of CO2 (median mean below 397 ppm) observed during the vegetation season. From the results of pairwise comparisons of recorded CO2 in most of the observation locations, it can be concluded that there was a significant difference in CO2 measurements in the heating season. The winter CO2 maximum was also observed in the "WIOS_2" measurement station (median mean over 450 ppm). For some sites, such as "RSP_1" and "RSP_6" the measured atmospheric CO2 concentrations were characterized by peaks (median mean up to 440 ppm), predominantly caused by emissions from anthropogenic sources. Furthermore, the amplitude of the CO2 concentrations measured in the heating season indicated that the meteorological episodes (mainly, air temperature fluctuations) contributed significantly to the high variation in CO2 values within sampling groups (interquartile range on box plots). The difference in the CO2 levels within and between the treatments (vegetative and non-vegetative period) can be explained by the contribution of different physical and physiological processes (respiration and photosynthesis components), complex emission source configuration and other multiple human-induced factors. The higher variability in meteorological conditions (duration of vegetation period, photosynthetically active radiation, precipitation, wind speed and direction) had a direct influence on the temporal patterns of atmospheric CO2 during vegetation and heating seasons. The results of the measurements of atmospheric CO2 were correlated with the meteorological conditions, represented in Table 1. The Pearson correlation indicated moderately inverse correlation with air temperature (r = −0.43 and ρ = −0.41 for Pearson and Spearman statistics, respectively, p < 0.05) and a direct correlation with relative humidity (r = 0.28 and ρ = 0.22 for Pearson and Spearman statistics, respectively, p < 0.05). The CO2 concentrations were positively correlated with atmospheric pressure (r = 0.17 and ρ = 0.24 for Pearson and Spearman statistics, respectively, p < 0.05) and associated with a negative weak relationship with wind speed and direction (r = −0.22 and ρ = −0.19 for Pearson and Spearman statistics, respectively, p < 0.05). The observation size was examined by the multi-factor ANOVA, with the following model: y = −326.104 − 0.014 × x1 + 0.076 × x2 + 1.168 × x3 + 0.160 × x4 − 0.104 × x5 (where x1-air temperature, x2-air humidity; x3-atmospheric pressure; x4-wind velocity; x5-wind speed); and indicated that the air temperature parameter was the only key factor controlling the intensity of atmospheric CO2 for the vegetation and heating periods (F < 0.05, df = 711, p < 0.05). In general, the regression model results revealed that the local distribution of CO2 is dependent on the weather patterns, however, some meteorological influences were more specific and their effects were less obvious.

Geostatistical Analysis of Temporal and Spatial Variability of Atmospheric CO2 Concentrations
According to the results obtained from different interpolation methods, the quality of various predicted surfaces for the vegetation and heating seasons were identified. IDW, spline, and natural neighbour (NB) approaches showed some similarities in the produced surfaces. However, natural neighbour prediction led to accurate values mainly within a short distance in sample points location. IDW and spline interpolation gave a consistent, but poor performance for the seasonal CO2 distribution since the surface is highly susceptible to edge effects (unsampled locations) and clusterization (clustered sample points). The lowest and highest values obtained by Voronoi (Thiessen) polygons were representative of the original data range, however, the prediction had a much greater morphology of the map (high differentiation between colour ranges) and did not represent realistic dispersion conditions for CO2 emission.
In general, the IDW and spline methods can be accepted for the analysis of CO2 variability in Wroclaw over the vegetation and heating seasons compared to the natural neighbour interpolation and Voronoi (Thiessen) polygons. Prediction of seasonal changes in the atmospheric CO2 concentration using spline and IDW techniques had a relatively accurate level. Regarding the average prediction bias, the low ME values were obtained for the spline technique, which suggests low interpolation errors compared to the chosen interpolation methods. The predictive ability of the IDW method was generally lower compared to spline predictions (higher ME, RMSE values), probably due to a slight underestimation of the observed concentrations. However, it is also worth noting that IDW proves to be effective in identifying pollution hotspots from the interpolated surfaces; moreover, spline tends to have a 'smoothening' effect (see Figures 5 and 6).
Every location (points) within a Thiessen polygon, due to the missing data at an unsampled area, had very little obvious trends within the dataset. The range of results obtained by IDW and spline interpolation was characterized by the minimum error variances (Table 2). However, the IDW approach strongly averaged data and did not produce values higher than the maximum and smaller than the minimum values of the observations. Overall, the differences of RMSE between IDW and spline were generally small, but IDW tended to higher errors, probably due to the uneven distribution of input points over the study area (edge effects and clusterization). The lower errors were introduced by natural neighbour estimating, as the interpolated values were close to the observed ones. However, the values predicted by natural neighbour were obtained only for the central part of the study extent, up to the contour of the data points. The absence of extrapolation results affected potentially large errors in the area outside sampling locations by the natural neighbour interpolation method. The results showed that the choice of the correct type of interpolation method depends on many factors including the nature of the observed variable, size of the sample dataset used on spatial scale and the period on which it is represented. The interpolated surfaces (Figure 3; Figure 4  The IDW interpolated surface (Figure 7) represents the best quantitative and accurate results of CO2 dispersion in the atmosphere compared with other interpolators in this study. The predicted mean CO2 values were highest in the heating season (nearly 431

Interpolation method results:
Heating season ppm) compared to the vegetation (mean 411 ppm). However, the IDW method works best for the dense enough, evenly-spaced sample point sets, that is why the formatted feature class cannot be sufficient to capture local CO2 concentration variations in the Wroclaw urban area. We studied the anisotropy by managing search direction and showing its impact on the spatial correlation structure for a given, generally low number of observations (vegetation vs. heating season). The direction of anisotropy was determined by estimating semivariograms in multiplane directions. The semivariograms and semivariogram maps with search direction used to determine the presence and direction of anisotropy in the data in the vegetation and heating season (pattern of spatial correlation to be stronger in certain directions) are shown in Figure 8. Clear anisotropy that was found with direction labeled in semivariogram surface/maps, especially in the heating season, might suggest the spatial variability of the observed data. In general, the spatial correlation of the data in the direction perpendicular to that represented in Figure 8 is lower than that parallel to it, whereas the anisotropic model reaches the sill more rapidly. Normally, with the semivariogram, the variation of the values should increase with distance, but in this case, the clouds are not illustrating the expected results. The tendency of the geographic relationships is not within the expected range mainly due to the character of data collection. Some points are unevenly distributed, also located on the edges of the measured network or more influenced by neighboring points when they are more clustered. The semivariogram for the IDW method represents the high average rate of change of data property with distance and not significant consistency of the variable through space (Figure 9).  The level of model accuracy was tested also through semivariogram estimates of spatial dependence between observations for a particular spline-created surface. This information again raises the issue of the sampling distribution tendency, as well as reflecting non-uniform geometrical configuration and spatial resolution of the sampling points ( Figure 11). Figure 11. The semivariogram graph of the predicted points for average CO2 variation with distance, obtained from the spline tension interpolated raster.
The Natural neighbour (NB) interpolation method was used as it generally works well with clustered scatter points [18]. A disadvantage of this technique is a limited interpolated surface up to data point contour; it does not extrapolate outside of the data locations. The natural neighbour method gives good results within uneven point distribution ( Figure 12); it estimates the CO2 concentration in a similar pattern compared to the spline interpolation (an equal part of the produced surface). The general picture of the presented pollutant distribution on the right side and in the central part of the work extent is quite similar that obtained by the previous method. However, the absence of extrapolation results affects potentially large errors in the area outside sampling locations. The spatial autocorrelation of the data points in the natural neighbour interpolated surface is visualized by the semivariogram in Figure 13. The semivariogram also represents the high average rate of change of the data property with distance and non-significant consistency of the variable through space. Figure 13. The semivariogram graph of the predicted points for average CO2 variation with distance, obtained from the natural neighbor interpolated raster.
The Voronoi diagram with the Thiessen polygons was used to illustrate the stationarity of the dataset. Each Thiessen polygon contains only a single point input feature and represents the relationship for the data to their environment. Every location within a Thiessen polygon was closer to its associated point than to the input point of any other polygon. The trends shown in the Voronoi map ( Figure 14) demonstrate the high variation with the dataset for the study area. The darker colors represent the higher rates of change of the CO2 concentration for the neighboring point dataset. The projected area due to the missing data at an unsampled location from the neighbouring observations had very little obvious trends within the dataset. This trend spreads in sites with no da-ta-mostly north and south and northeast. Furthermore, this method showed the highest differences in observed and estimated CO2 values between vegetation and heating seasons. Figure 14. GIS coupling mapping of spatial CO2 distribution in the vegetation and heating seasons, a surface produced using the Voronoi diagram with the Thiessen polygons method.
The raster surface created with the spatial analyst tool in GIS software does not have semivariogram estimates of spatial dependence between observations for a particular Thiessen Polygons. The statistical model was estimated by calculating mean prediction error (MPE) and root-mean-square prediction errors (RMSE). The experimental results showed the main factors affecting the accuracy of interpolation are sampling density, sampling mode, and sampling location. The specific sampling datasets that do not reflect the complexity of the spatial characteristics rapidly increase errors of interpolation methods and the estimated values could be under-/overestimated. Organizing the sampling mode plays a key role in future predictions, for example, in regular-grid sampling (with cells of equal size), the spline interpolation method has the highest interpolation accuracy, and the IDW has the lowest interpolation accuracy.

Sampling Strategies for Optimization of the Effect of Interpolation
It is not easy to determine which interpolation is better by looking only at the spatial pattern of the interpolated surfaces. Multiple tools and approaches exist to answer the question about the differences among various methods. Several studies have demonstrated that the various spatial interpolation techniques perform differently depending on the type of inputs, the geometrical configuration of the samples, and spatial resolution [67][68][69]. In general, the performance of the kriging technique is better than the other studied interpolation methods, but it is more effective in predictions when the number of input points is higher in the monitoring network. The predictions of the kriging interpo-lation are greatly affected by the sampling configuration and spatial orientation of points of interest in the area under study.
The best-interpolated outputs may only be when datasets used contain a sufficient number of sample points, evenly or high-density located in the study area. The limited number of observation points, as well as the isolated locations, both had a critical influence on the estimated CO2 distribution surface. The performance of chosen interpolators in the absence of spread datasets seems restricted for high-quality predictions. Unmeasured locations have the most impact on interpolated values and significantly entail the largest errors in the surrounding surface. As a result, it is critical that additional sampling points should be incorporated into further spatial estimates, especially in more problematic parts of the study area (edges and central). Regardless of the limitations of kriging interpolation on a number of data locations, the ordinary kriging prediction standard error map was developed in this study for a better representation of the spatial nature of atmospheric CO2 variations ( Figure 15).
The design of the new monitoring locations was estimated using the following GIS interpolation methods: kriging technique (a prediction standard error map), create random points, and densify sampling network geoprocessing tools. To decide, which areas have the highest influence on the prediction surface, the prediction standard error (a kriging geostatistical layer) was computed. Furthermore, the spatially continuous prediction error map, associated with the kriging layer, was used as the selection criterion in determining the new sampling sites. The additional monitoring points in areas that are void of air quality measurements were distributed randomly by applying the random sampling approach with the optimization criteria (input weight raster). For this purpose, we prepared a probability raster taking into account additional layers such as land-use characteristics and large-size emission sources allocation and their combination through the use of the appropriate map algebra expressions. In this approach, the new sampling design depends not only on the monitoring network density but also on the importance of land uses, where the probability value is higher. Consistently, the prediction surface created by interpolation was improved by applying the most suitable error statistic layers to determine where the new monitoring sites should be built. The next step for spatial analysis of these parameters was to form the empirical semivariogram that provides information on the spatial autocorrelation of datasets. The experimental semivariogram of a dataset for the vegetation and heating periods with its semivariogram curves is shown in Figure 16. There can be observed deviations of the points on the semivariogram from the averaged values-some points are above and below of the model curve. These variations could be the result of direction (anisotropy) trends, labeled on semivariogram surface/maps (see maps in a corner), or some unmeasurable physical process of pollution dispersion. Spatial optimization from the kriging model indicates that there is a need to locate additional sampling points, especially in southern and southeast directions (values of the highest standard errors of prediction). A random sampling strategy was used to determine the new distribution of sample points by the reconstruction of the air sampling network density, especially in unmeasured locations. By using kriging interpolation ( Figure 17) and the random points tool in ArcGIS software the error map was created, where the future sampling campaigns can be planned and a more realistic representation of CO2 distribution across the most problematic parts of the study area should be achieved.
This strategy would avoid biases related to data variability in the study area as compared to sampling at regularly spaced intervals. However, a random sampling strategy is largely dependent on the number of points and may not represent the important features of the data in complex terrain (e.g., local conditions, land cover, scope and sources of emission, spatial emission processes, etc.) and requires additional field inspections.
With the change of the sampling mode from regular-grid sampling to combined sampling, significant improvements in interpolation predictions were accomplished. Due to the much better positioning of sample locations in feature space, the reduced interpolation-error variance of the residuals was achieved. In the right-hand corner of the kriging predicted surface, the sampling intensity was increased to ensure adequate coverage of the observation network. The new sample locations were created in the bottom right and top right sides of the work extent to more effectively examine this area and shorten sampling intervals. Finally, the results obtained from the map of the optimal sampling scheme will lead to a better understanding of the spatial distribution pattern of pollutants in the city of Wroclaw and defining zones with differentiated intensity. The performance of the interpolation techniques can be further improved by having a sufficient observation network with reasonable precision and including parameters related to the variable environmental parameters (e.g., wind, weather patterns). The critical requirements to more accurately capture the temporal and spatial variability of pollutant concentrations are high-density observational data conducted for a long-term period on a regular basis (e.g., daily). Since our measurements of CO2 levels were based on irregular sampling at a few sampling locations, we propose to include in the future sampling strategy unmeasured locations by discovering different spatial conditions (land-use characteristics) regardless of the actual distance to the surroundings of the existing monitoring stations, covering main emission sources in the study area, and capturing dependencies (spatial correlation) between the dispersion of air pollutants and meteorological data.
The future sampling strategy is based on searching for the problem features in areas of high uncertainty. Spatial optimization and validation procedures answered the research question of this part of the study, i.e., how many sample units (points) should be located to decide which interpolation technique is the most suitable for representation spatial and temporal distribution of atmospheric CO2 in given geospatial features, local climate, and meteorological conditions. In general, finding the optimal design of the spatial air quality monitoring network is a significant issue. The advantage is that an optimal sampling strategy leads to fewer sampling points and more cost-effective monitoring. The fact of this matter is that the well-planned urban-scale CO2 in situ emission observations with a focus on emissions according to source and geographic patterns, also comparing the measured data with city emission inventories, are strongly recommended.

Conclusions
Spatial interpolation methods are widely used in various fields of science for monitoring, mapping and spatial analysis of air pollution distribution. Five interpolation techniques (inverse distance weighting (IDW), spline, natural neighbour (NB) interpolation, interpolation based on a triangulated irregular network (TIN), Voronoi polygons, ordinary kriging) were compared in the present study for their ability to accurately produce the interpolated surface of atmospheric CO2 concentrations. Each interpolation method was tested for effectiveness in determining the CO2 distribution in the city of Wroclaw in two different periods (vegetation vs. heating seasons) from July 2017 to August 2018. The results obtained demonstrate that the acceptable quantitative and ac-curate results are obtained by IDW and spline interpolation compared to natural neighbour interpolation and Voronoi (Thiessen) polygons. The interpolated surface of CO2 level differs by almost 30% depending on the observation period (highest in the heating season, nearly 431 ppm, and lowest in the vegetation with mean of 417 ppm).
Because of the limited observation network which partly covers the analysed districts in the present case, there is no confirmed optimal interpolation method, only the optimal choice under certain circumstances. According to cross-validation estimates, the spline tension interpolation is identified as the most suitable tool for spatial, as well as temporal modelling of pollution in the studied geographic area. The high distinction between observed and estimated (interpolated) values for all surfaces was observed as the predicted surface is highly exposed to the present edge effects (unsampled sites) and clusterization (congestion of the sample points). This suggests that the location used to account for variations in atmospheric CO2 plays a critical role. The findings demonstrate that the interpolators should be used when the set of points is dense enough to capture the full extent of local surface variation.
In this study, the new sampling strategy was based on searching for the problem features in areas of high uncertainty. The performance of interpolation techniques was improved by the reconstruction of the air sampling network density, especially in unmeasured locations. By using kriging interpolation, the error map was created to determine where more information is needed when future sampling can be planned if necessary. Furthermore, the optimal number and the new randomly selected locations of sampling points were determined. In this case, the kriging technique was used not only for interpolating values in the site but also to directly estimate the degree of uncertainty associated with a map prediction. As a result of optimization, the more realistic representation of CO2 distribution across the more problematic parts of the study area was achieved. This is due to the much better positioning of sample locations in feature space, which results in a reduction of the interpolation-error variance of the residuals.
In general, based on the GIS tools, the results of in situ measurements can be represented as the data interpretation and visualization of related time series of air CO2 concentrations. Then, the urban air pollution maps with used parameters can accurately identify the areas of high concentration of GHGs, which would likely enable cities to manage their efforts better and set realistic targets of emission reduction. This study, however, left several questions unanswered, in particular regarding the optimal interpolation technique for the specific geospatial features, distribution of local emission source types (anthropogenic and biogenic) and contributions of atmospheric transport to the observed seasonal cycles of CO2. The fact of the matter is that the optimization of the air quality monitoring network for carbon dioxide for a medium-large city is a significant issue and requires advanced interpolation and modelling methods to handle time series with large gaps in measurements.