Spatiotemporal Patterns of Air Pollution in an Industrialised City—A Case Study of Ust-Kamenogorsk, Kazakhstan

: Air quality issues still affect the quality of life for people in industrialised cities around the world. The investigations should include the identiﬁcation of the sources of the pollution and its distribution in space and time. This work is the ﬁrst attempt to perform identiﬁcation of the sources of pollution in Ust-Kamenogorsk city in Kazakhstan. Analysis of retrospective data (including ten variables (TSP, SO 2 , CO, NO 2 , phenol, HF, HCl, H 2 SO 4 , formaldehyde, H 2 S) from ﬁve monitoring stations for the period 2017–2021) using multivariate statistical methods and hierarchical cluster analysis has been performed to assess spatiotemporal patterns of air quality of the city. The results indicate that the contamination patterns can be grouped into two categories: cold and warm seasons. The study revealed the dangerous concentrations of NO 2 and SO 2 exceeded the limits by 2–3 and 1.5–2 times, independently of the seasonality. Averaged concentrations of TSP slightly exceeded the established limits for the most industrialised part of the city. Concentrations of HF and formaldehyde signiﬁcantly rose during the cold seasons compared to the warm seasons. Other chemical parameters signiﬁcantly depend on the seasonality and locations of the sampling points. The major reason for air pollution is twofold—the use of a burnt-coal throughout the year for electricity and heat generation (especially during the cold seasons) and the high density of the heavy metallurgy industry in the city. The principal component analysis conﬁrms a high loading of industrial sources of air pollution on both spatial and seasonal dimensions.


Introduction
In 2022 humanity still faces air pollution issues, while it is much known about efficient ways to handle them. Statistics revealed the significant pollution in big cities, even in countries where the possibility for permanent monitoring with the wide web of monitoring stations has been established [1]. At the same time, the situation can be much worse in developing countries, especially in industrialised cities, as the cities have not been connected to the common database and the ability to assess environmental and social consequences has been hidden. Studies say that despite direct hazards to health and the economy, the influence and the interests of big manufacturing and energy companies sometimes can prevail over the mentioned issues and exert pressure on regulatory aspects [2]. These factors make the air pollution issues still actual around the world. For instance, a study by Yuan et al. [3] claims that cities with pollution-intensive industries are responsible for urban air pollution (called "super emitters" as they dominate as large point sources of pollution [4]). According to another study by Gu et al. [5], the industrial sector has been where the system of permanent monitoring with extended parameters, such as PMs, has still been established [32].
Under these conditions, one of the efficient ways to investigate sources apportionment with the following understanding of the needed steps for air quality management can be multivariate statistical techniques, particularly Principal Component Analysis (PCA) and Cluster Analysis. In addition, using analysis of scores and loadings in PCA can give a representation of potential chemical reactions in ambient air [33]. The approach to applying PCA for the assessment of patterns in air quality has already been applied several times. For instance, Dominick et al. [34] aimed to investigate possible sources of air pollutants and spatial patterns in Malaysia using the same techniques as the authors plan to do with grouping particulate contaminants in principal components. Application of the PCA to find the correlation between gaseous pollutant concentrations, meteorological factors and potential sources of pollution has identified the contribution of combustion-and noncombustion-related emitters in Greece [35] or in India [36]. The tool has been used for the same purpose also by Azid et al. in the context of the prediction of air pollution [37]. PCA has supported the assessment and identification of the sources of air pollution in India, where complex industrial activities exist [38]. Revealing information about the sources and mechanisms of air pollution in Madrid and visualising their spatial distribution using PCA and the geostatistical method has been carried out by Núñez-Alonso et al. [39]. Particular attention has been paid to the investigation of the presence of metals only in particulate matters coupled with multivariate statistical techniques in an Iranian industrial city [40]. For example, a combination of the analysis of the chemical composition of fine particulate matters with the focus on metals presence with Absolute Principal Component Analysis allowed attributing the identified pollutants to their sources in the USA [41].
The aim of this study is to analyse key factors impacting air quality in Ust-Kamenogorsk using the available dataset for the period 2017-2021 by multivariate statistical techniques on spatial and temporal scales. This approach enables us to investigate potential sources of apportionment using the large but limited dataset of the observations in the city for the first time.

Study Area
Ust-Kamenogorsk (or Oskemen) is located in northeastern Kazakhstan in the foothills of the Altay and at a confluence of the Irtysh and the Ulba rivers. The climate of the Ust-Kamenogorsk region is temperate continental. The city is divided into two parts by the Irtysh River. The city is surrounded by Shanovsky and Kalbinsky mountain ranges on the southeastern site [42].
A number of the largest Kazakhstani metallurgy plants for the production of nonferrous metals are located in the city: the metallurgical complex of Kazzinc LLP, the Ulba metallurgical plant, and the Ust-Kamenogorsk titanium and magnesium plant. Coupled with the Ust-Kamenogorsk and the Sogrinskaya thermal power plants, the industrial activities put significant pressure on the air conditions of the city. The locations of the main industries are presented in Figure 1. The main characteristics of the industries of the city are presented in Table 1.
The city can be conditionally divided into three zones: two big industrial areas (the northern and the northeastern industrial zones), and downtown, located on the left bank of the Irtysh River, with its own thermal power plant. The northern industrial zone includes the locations of the Ust-Kamenogorsk metallurgical complex of Kazzinc LLP, the Ulba metallurgical plant, and the Ust-Kamenogorsk thermal power plant. The northeastern industrial zone includes the Ust-Kamenogorsk titanium-magnesium plant and Sogrinskaya thermal power station. Accordingly, five monitoring stations (noted as S1-S5 in Figure 1) are located for sampling and analysis of the air quality: -Station 1-is located in the northern industrial zone; -Station 2-is located in the administrative centre of the city; -Station 3-is located in the north-western part of the city, adjacent to the northern industrial zone; -Station 4-is located in the northeastern industrial zone; -Station 5-is located downtown.

Multivariate Statistical Techniques
Correlation analysis, principal components analysis (factor analysis), and hierarchical cluster analysis were applied to identify the multivariate relationships between different variables and samples in the study area. The dataset was normalised for the elimination of the effect from differences in units (Equation (1)).
where Zij are normalised values from xij, i is the represented variables, j is the sample number, mi is the mean value, and SD is the standard deviation of the sample. The relation between each pair of variables was measured by Pearson's correlation coefficient to determine the associations among different variables. Correlation coefficients greater than 0.5 were considered significant. PCA recognises the most significant parameters from a big dataset of inter-correlated parameters and creates independent variables (Equation (2)).
where z is the component score, a is the component loading, x is the measured value of a variable, i is the component number, j is the sample number, and m is the total number of variables. Factor analysis (FA) is a similar approach to PCA. However, PC is presented as a linear combination of parameters. FA follows PCA and takes into account unobservable, hypothetical, latent variables. They are included in the equation with the special residual term (Equation (3)).

Multivariate Statistical Techniques
Correlation analysis, principal components analysis (factor analysis), and hierarchical cluster analysis were applied to identify the multivariate relationships between different variables and samples in the study area. The dataset was normalised for the elimination of the effect from differences in units (Equation (1)).
where Z ij are normalised values from x ij , i is the represented variables, j is the sample number, m i is the mean value, and SD is the standard deviation of the sample. The relation between each pair of variables was measured by Pearson's correlation coefficient to determine the associations among different variables. Correlation coefficients greater than 0.5 were considered significant. PCA recognises the most significant parameters from a big dataset of inter-correlated parameters and creates independent variables (Equation (2)).
where z is the component score, a is the component loading, x is the measured value of a variable, i is the component number, j is the sample number, and m is the total number of variables. Factor analysis (FA) is a similar approach to PCA. However, PC is presented as a linear combination of parameters. FA follows PCA and takes into account unobservable, hypothetical, latent variables. They are included in the equation with the special residual term (Equation (3)).
where z is the measured variable, a is the factor loading, f is the factor score, e is the residual term according to errors or another source of variation, i is the sample number, and m is the total number of factors. Cluster analysis was used to assemble similar groups of the monitoring dates due to similarities between their variables. Hierarchical agglomerative CA provided Ward's linkage distance, reported as D link /D max , which represents the quotient between the linkage distances for each case divided by the maximal linkage distance. The produced dendrogram enables analysing similarities easily. Ward's linkage and the Euclidean distance as similarity measurements are commonly used for cluster analysis for the assessment of air quality [43].
All mathematical and statistical computations were performed using Microsoft Office Excel 2016 and IBM SPSS Statistics 26 software.

Data Management and Methodological Framework
The National Hydrometeorological Service of Kazakhstan "Kazhydromet" has provided a raw dataset for this study. Equipment, lab, staff, and methodology of analysis are certified and follow national and international standards for QC/QA. The dataset contains the results of manual measurements of 17 contaminants. Most of these measurements have been carried out four times per day at five stations ( Figure 1). To obtain the most detailed picture, ten contaminants have been chosen for spatiotemporal and statistical assessment: total suspended particles (TSP); SO 2 ; CO; NO 2 ; phenol; HF; HCl; H 2 SO 4 ; formaldehyde; and H 2 S for the period 2017-2021. The reason for the selection of these contaminants and this period is the fullest available dataset, as they have been measured on a regular daily basis, except for weekends and vacations. It is important to note that TSP is not a commonly used parameter worldwide, while information about common worldwide contaminants particulate matter (PM) has not been provided due to its absence. Information about temperature, wind speed, wind direction, and humidity has been obtained from the archive of the web resource www.rp5.kz (accessed on 15 November 2022) [44] to identify possible interconnections with meteorological conditions of the region.
The first step of the analysis was to identify potentially similar periods in a matter of contamination. This step includes hierarchical clustering analysis, which was performed for each studied year separately with the following evaluation and identification of the grouped periods. Dates against daily averaged values of contaminants have been used as parameters for this analysis. The daily averaged values of the contaminants were also used for the second step: performing descriptive statistical analysis to evaluate air pollution in general for the selected temporal clusters. The number of daily averaged observations was 859 and 602 for the cold and warm seasons for all the monitored contaminants. The spatiotemporal assessment was the third step of this study and included time series and spatial analysis using geoinformation systems. Monthly averaged values were used to identify and evaluate trends in air pollution through the studied period, while mean values for the whole period were used for the spatial assessment. Interpolation using the inverse distance method was used to describe the distribution of the contamination within the city [45]. The fourth step of this study was to perform PCA based on the correlation matrix. The correlation matrix was built using daily averaged values of the studied contaminants coupled with the meteorological parameters, while the PCA was completed using chemical parameters only on a daily averaged measurements basis.

Hierarchical Clustering Analysis
Clustering analysis has been performed using dates as variables based on parameters of contamination to identify the similarities within particular periods. The final result of the analysis can be seen in Figure 2. The results indicate that the studied period can be grouped into two categories: cold (including months from September to March) and warm (from April to August) seasons. This can be explained by the fact that contaminants (except formaldehyde) have shown maximum concentrations during cold seasons with peaks during the months of December-February ( Figure 3). Thus, the following assessment includes a separate analysis of air pollution patterns in both cold and warm seasons. These findings in seasonality can also be used in better planning for preparing for avoiding public health burdens during the seasons of interest [46].

Descriptive Statistics
Tables 2 and 3 present the results of measurements of air quality from the monitoring stations in the city according to the limits established by the Kazakhstani government and recommended by the WHO. It is clearly seen that heavy industry and coal-burnt energy cause the permanent exceeding of the permissible daily values along the city during both cold and warm seasons. The worst situation remains for NO 2 and SO 2 , which are significantly higher than both limits (by 2-3 times for NO 2 and by 1.5-2 times for SO 2 ) during the whole period of analysis ( Figure 3). Averaged concentrations of TSP slightly exceeded the established limits for Stations 1 and 3 during the cold seasons, with the peak values during January months in 2017 and 2018 (Figure 3), while during the warm seasons, they can be characterised as safe. Concentrations of HF and H 2 S show slight excess over the recommended concentrations along all stations (except Station 4) during both seasons. However, the presence of HF dropped below the limit line after January 2020 ( Figure 3). Surprisingly, the averaged and median concentrations of CO have not shown exceeding values, except several daily exceedings of the parameter were recognised during the cold seasons. In general, the concentrations of the pollutants show a slightly descending trend while they still remain much above the permissible level. It can be explained by the report from the National Statistics Committee, which claims that industrial emissions decreased from 29 to 27.9 kt/y for SO 2 and from 10.8 to 10.4 kt/y for CO for the period between 2017 and 2021 [47].
It is fair to note that the recently updated Kazakhstani limits [48] have not followed recommendations from international standards [49] and experts [50]. The limits for main pollutants have not been revised and still are above the recommendations from WHO [51].
The spatial distribution of the contaminants is presented in Figures 4 and 5. Only TSP, SO 2 , NO 2 , HF, and formaldehyde show the most significant patterns in their dispersion within the city. It is clearly seen that the major emitter of the city is located near Station 1 and represents the northern industrial zone with two huge metallurgy factories and one thermal power plant. Surprisingly, the safest location in the city regarding concentrations of the pollutants is near Station 4, despite its close location to the northeastern industrial zone. The main difference between seasons is the presence of pollution near Station 5, in the southern direction from the major emitters to the downtown, which is located on the left-bank part of the city with its own thermal power plant. While concentrations of HF and formaldehyde look high, comparatively with the centre of the emissions during the cold seasons, the presence of these contaminants during the warm seasons looks safe and strives for minimal values within the city (Figure 4d,e and Figure 5d,e). Concentrations of SO 2 and NO 2 also do not show significant changes between the two seasons, with their decrease in the southern direction from Station 1 (Figure 4b,c and Figure 5b,c). The worst situation in a matter of TSP is in the northwestern part of the city near Stations 1 and 3 (Figures 4a and 5a).

Principal Component Analysis
The correlation matrix was employed for each monitoring station for all 859 and 602 measurements (for cold and warm seasons, respectively) for determining relationships: in particular, pairs between contaminants and meteorological parameters. Figure 6 shows the averaged values for correlation coefficients among the monitoring stations. The analysis has not identified any correlation between the chemical and meteorological (including wind direction, wind speed, relative humidity, and temperature) parameters. Tables 4 and 5, representing the principal components (PCs), have been developed for contaminants measurements only to identify groups of the contaminants, which can be combined by their common origin or specific properties of their spatiotemporal distribution ( Table 6). Bold values in Tables 4 and 5 denote high loadings of the contaminants to the calculated PCs. The eigenvalues of the identified PCs are all greater than 1.0, and according to the Kaiser criterion, these PCs have to be chosen [52]. The results did not show significant differences between the stations in general, which may indicate a relatively equal distribution of pollution within the city. Particular differences are described in the subsections for each PC below. It is important to note here that the correlation during the cold seasons is stronger than during the warm seasons ( Figure 6). Two PCs have been identified for the cold season analysis for the monitoring Stations 1 and 2, while three PCs have been identified for other monitoring stations. Therefore, included parameters in PCs 1 and 2 for Stations 3-5 are almost the same as the parameters included in PC1 for Stations 1-2. PCA, for the warm season, has identified three PCs for Stations 2-5 and four PCs for Station 1.

Principal Component Analysis
The correlation matrix was employed for each monitoring station for all 859 and 602 measurements (for cold and warm seasons, respectively) for determining relationships: in particular, pairs between contaminants and meteorological parameters. Figure 6 shows the averaged values for correlation coefficients among the monitoring stations. The analysis has not identified any correlation between the chemical and meteorological (including wind direction, wind speed, relative humidity, and temperature) parameters. Tables 4 and 5, representing the principal components (PCs), have been developed for contaminants measurements only to identify groups of the contaminants, which can be combined by their common origin or specific properties of their spatiotemporal distribution ( Table  6). Bold values in Tables 4 and 5 denote high loadings of the contaminants to the calculated PCs. The eigenvalues of the identified PCs are all greater than 1.0, and according to the Kaiser criterion, these PCs have to be chosen [52]. The results did not show significant differences between the stations in general, which may indicate a relatively equal distribution of pollution within the city. Particular differences are described in the subsections for each PC below. It is important to note here that the correlation during the cold seasons is stronger than during the warm seasons ( Figure 6). Two PCs have been identified for the cold season analysis for the monitoring Stations 1 and 2, while three PCs have been identified for other monitoring stations. Therefore, included parameters in PCs 1 and 2 for Stations 3-5 are almost the same as the parameters included in PC1 for Stations 1-2. PCA, for the warm season, has identified three PCs for Stations 2-5 and four PCs for Station 1.     PC1 of the cold season is characterised by high positive weight values for TSP, SO 2 , CO, and H 2 S for every monitoring station. NO 2 , phenol, HF, and H 2 SO 4 may also be conditionally added to the list, as they have been formulated as PC 1 for Stations 1, 2, 3, and 4 and have been included in PC2 for Station 5. Altogether, the PC1 for Stations 1-2 and PC1 + PC2 for Stations 3-5 explain 47.7% of the total variance on average. As Figure 6 indicates, there is a strong positive correlation between TSP, SO 2 , CO, and H 2 S. NO 2 and H 2 SO 4 also show a significant correlation with the listed contaminants. These ions are the major contributors to the total suspended particles. Additionally, these ions correlate with each other. It can be concluded that these contaminants have a shared source of their origin. The results of the appearance of TSP and CO in one group with SO 2 and NO 2 for the cold seasons and theirs separating for the warm seasons can be well explained by the fact that SO 2 and NO 2 are emitted mainly from combustion processes, while the sources for TSP are far more (for example, wind-driven and traffic-related re-suspension and biogenic sources) [53]. In addition, the metallurgy sector, which prevails in Ust-Kamenogorsk among emitters, can be characterised as the major source of SO 2 [54] and TSP (on behalf of PM 2.5 ) [55]. It is well known that the exposure of TSP-sulphur-derived contaminants in the air is believed to be representative of emissions from the combustion of fossil fuels, which increases the risk for bronchitis and some other respiratory disorders [56]. In addition, it is worth mentioning that nitrogen dioxide has been listed as an emerging pollutant causing morbidity and mortality [57].
PC1 of the warm season seems more uncertain, as there are no contaminants repeated in each of the monitoring stations. SO 2 (in four of five monitoring stations) and NO 2 and formaldehyde (in three of five) have been identified in PC1 as the most common contaminants. The contaminants SO 2 , NO 2 , and formaldehyde have been the same for the monitoring Stations 2, 4, and 5. This combination of the stations and the identified contaminants can be explained by the proximity of the monitoring stations to heat supply sources and highways [58] and the respective photochemical reactions in the atmosphere nearby [59]. In addition, the contaminants HCl and H 2 S have been identified as key contaminants in PC1 for Stations 1 and 3, which can be explained by emissions from the largest thermal power plant. A study [60] says that H 2 S is formed by the combustion of coal with high content of sulphur, which is an exact characteristic of Kazakhstani coal. The loading of HCl within this PC can be explained by the emissions from Kazzinc LLP according to the technological scheme of the enterprise.

PC2
PC2 in warm seasons compensates for the above-mentioned contaminants in the stations which previously have not been listed as belonging to PC1. For example, for Stations 2, 4, and 5, the list of contaminants includes HCl and H 2 S. As mentioned above, the source of HCl emissions can be the metallurgy industry, which uses acid for production purposes. H2S can be emitted due to coal combustion. Station 2 is located in the central part of the city at a considerable distance from industrial facilities. However, sources of emissions located at a reasonable elevation can disperse HCl and H 2 S to the centre of the city. For Stations 4 and 5, the presence of the parameters in this PC can be explained by the proximity of the northeastern industrial zone and the Sogrinskaya and the left-bank thermal power plants.
For Stations 1 and 3, the list of contaminants has been fulfilled by SO 2 , NO 2 , and formaldehyde. The northern industrial zone is a dominant emitter for Station 3, which can easily explain the presence of SO 2 and NO 2 in this PC. HF has been identified as the key contaminant for this PC in four of five monitoring stations. Figures 4 and 5 show that HF is mainly concentrated in the area of the northern industrial zone and then is evenly distributed throughout the city during both the cold and warm seasons. The thermal power plants do not monitor the concentrations of HF in their emissions, while there are a number of research [61,62] and official reports [63,64] showing the coal-fired stations as the largest anthropogenic sources of HF. The situation for formaldehyde looks the same as for HF. No enterprises monitor this chemical, while the direction of spatial distribution shows a similar direction.

PC3
PC3, for the warm season, has grouped TPS and CO, which indicates their shared source of origin, even though there is no household heating during the comfortable weather conditions and emissions from the central heating are expected to be minimal. Moreover, it is expected that nature might neglect the effect of air pollution by these contaminants, which seems to be a reason for such difference in the rate of these contaminants in the cold (the most impactful contaminants) and warm (the least impactful contaminants) seasons. In addition, phenol (in Stations 1 and 4) and H 2 SO 4 (in Stations 2 and 3) have been revealed as belonging to this PC. Sulphuric acid is actively used in metallurgy, and this parameter shows the shared source in this PC with TPS and CO. The appearance of phenol in this PC requires additional studies, as its spread is unusual for metallurgy. This PC can be explained by a geographical location, as it focuses on the air conditions in the central part of the city. While emissions can be dispersed from the northern industrial zone according to Figures 4 and 5, the presence of phenol can be explained by intensive traffic [65]. Table 6 summarises a conditional grouping of identified PCs, results of the spatiotemporal assessment, hierarchical clustering analysis, and respective chemicals. Group 1 for the cold seasons can be explained by the intensive use of the fossil fuel fired by all users: power plants, industry, and households coupled with meteorological conditions (based on Figure 6). This mix ensures a variety of pollutants are released into the atmosphere by the burning of different types of fuel. The same group for the warm seasons indicates and highlights the problem of coal consumption, mainly by industrial enterprises, which is indicated by the uniform seasonal distribution of NO 2 and SO 2 [4]. Group 2 for the cold seasons can be explained by specific industrial processes and the release of the associated contaminants into the atmosphere. Group 2 for the warm seasons could indicate a contribution of traffic to air pollution. While local authorities have attempted to explain air quality issues by a high density of motor transport, the results of this study confirm outcomes of previous research on heavy air pollution in the city caused by non-transport-related sources [66]: that this factor has a low significance comparatively with the impact of industrial activities, especially burnt-coal based.

Conclusions
Ust-Kamenogorsk is one of the most important industrial centres of Kazakhstan, where non-ferrous metals are produced and exported abroad for the largest world companies. The active consumption of coal and raw material orientation of the industrial enterprises makes Ust-Kamenogorsk one of the most polluted industrial cities in the world. This study, for the first time, aimed to analyse spatiotemporal patterns of air pollution in the city and to identify potential sources of apportionment using multivariate statistical techniques. The results show that the combination of large enterprises and coal-fired thermal power plants severely affects air quality in Ust-Kamenogorsk. The average concentrations of SO 2 and NO 2 for the entire study period exceeded the standards of WHO and Kazakhstan within the whole city all year round. The major emitters are located in the northern industrial zone with two huge metallurgy factories and one thermal power plant. The Principal Component Analysis revealed that emissions of TSP, SO 2 , CO, H 2 S, NO 2 , HF, and H 2 SO 4 with high likelihood are of industrial origin, which is significantly aggravated during the cold seasons. While official reports show a decrease in industrial emissions, air quality has not been improved even during the warm seasons when households stop heating with the expected elimination of air pollution.
The results of this study can be valuable for decision-makers in developing and applying respective strategies and actions for a decrease in air pollution and elimination of the social and environmental effects. The main limitation of the research is the difficulty in determining the ratio of the contributions of the major emitters to the high level of air pollution. Future research will focus on a detailed investigation of the composition of unstudied contaminants (particularly, particulate matters) in both ambient air and the zones of industrial enterprises. This research should be combined with the modelling of source profiles or fingerprints.
The air quality issues in Ust-Kamenogorsk most probably have been caused by a combination of the factors: weak environmental regulation and control, the influence of large companies on the legislative process through large professional associations, and an outdated and energy-intensive industry. The authors hope that the outcomes of this and other studies would be a sufficient research basis for authorities to develop the right strategy for air management in the region.