Temporal and Machine Learning-Based Principal Component and Clustering Analysis of VOCs and Their Role in Urban Air Pollution and Ozone Formation

Balendra V. S. Chauhan; Maureen J. Berg; Ajit Sharma; Kirsty L. Smallbone; Kevin P. Wyche

doi:10.3390/atmos16060724

,

and

¹

Centre for Environment and Societies, School of Applied Sciences, University of Brighton, Brighton BN2 4GJ, UK

²

Tech GPT Ltd., Newcastle NE6 2SR, UK

³

Department of Chemistry, School of Chemical Engineering and Physical Sciences, Lovely Professional University, Phagwara 144411, Punjab, India

⁴

School of Science and Technology, Nottingham Trent University, Nottingham NG1 4BU, UK

Atmosphere2025, 16(6), 724;https://doi.org/10.3390/atmos16060724

This article belongs to the Special Issue Air Pollution: Emission Characteristics and Formation Mechanisms

Version Notes

Order Reprints

Abstract

This study investigates the temporal dynamics, sources, and photochemical behaviour of key volatile organic compounds (VOCs) along Marylebone Road, London (1 January 2015–1 January 2023), a heavily trafficked urban area. Hourly measurements of benzene, toluene, ethylbenzene, ethene, propene, isoprene, propane, and ethyne, alongside ozone (O₃) and meteorological data, were analysed using correlation matrices, regression, cross-correlation, diurnal/seasonal analysis, wind-sector analysis, PCA (Principal Component Analysis), and clustering. Strong inter-VOC correlations (e.g., benzene–ethylbenzene: r = 0.86, R² = 0.75; ethene–propene: r = 0.68, R² = 0.53) highlighted dominant vehicular sources. Diurnal peaks of benzene, toluene, and ethylbenzene aligned with rush hours, while O₃ minima occurred in early mornings due to NO titration. VOCs peaked in winter under low mixing heights, whereas O₃ was highest in summer. Wind-sector analysis revealed dominant VOC emissions from SSW (south-southwest)–WSW (west-southwest) directions; ethyne peaked from the E (east)/ENE (east-northeast). O₃ concentrations were highest under SE (southeast)–SSE (south-southeast) flows. PCA showed 39.8% of variance linked to traffic-related VOCs (PC1) and 14.8% to biogenic/temperature-driven sources (PC2). K-means clustering (k = 3) identified three regimes: high VOCs/low O₃ in stagnant, cool air; mixed conditions; and low VOCs/high O₃ in warmer, aged air masses. Findings highlight complex VOC–O₃ interactions and stress the need for source-specific mitigation strategies in urban air quality management.

Keywords:

ozone; principal component analysis; K-means; VOCs; meteorology; urban air quality; photochemistry; wind-sector analysis; machine learning

1. Introduction

VOCs are a significant group of organic chemicals that are typically found in a gaseous form and vaporize easily, readily entering the environment under normal conditions [1,2,3,4]. They are synthetic chemicals broadly used in the production of numerous day-to-day products for residential and commercial applications [5]. VOCs can occur naturally in the environment as biogenic compounds emitted by plants [6], or they can be anthropogenic, resulting from human activities [7]. Although biogenic emissions persist, there has been a substantial rise in anthropogenic VOC emissions over recent decades due to increasing industrialization and urbanization [5]. Urban areas are particularly vulnerable to anthropogenic VOC emissions, with major sources including vehicular traffic and industrial operations [8]. Vehicle emissions remain a predominant contributor, with VOC levels influenced by vehicle type, age, fuel composition, engine efficiency, driving patterns, and maintenance [9]. Emissions often include a wide spectrum of compounds such as alkanes, aromatics, and halocarbons: for instance, hexene, pentene, butene, butadiene, dodecane, undecane, decane, octane, methyl-cyclohexane, diethylbenzene, propylbenzene, trimethylbenzene, ethylbenzene, styrene, benzene, toluene, and various chlorinated hydrocarbons [10].

In addition to outdoor sources, VOCs are widespread in indoor environments [11,12]. Modern residential, commercial, and institutional buildings often utilize chemical-based products during construction and furnishing such as paints, adhesives, sealants, varnishes, and cleaning agents that emit VOCs [13,14,15]. Common indoor VOCs may originate from carpets (e.g., benzene, styrene) [16], household cleaners (e.g., formaldehyde, xylene) [17], personal care products (e.g., toluene) [18], electronics (e.g., formaldehyde) [19], and plastic materials (e.g., ethylbenzene) [20]. Tobacco smoke is another significant indoor source, frequently containing hazardous compounds like benzene, toluene, ethylbenzene, xylene (BTEX), and formaldehyde [21]. Once released into the atmosphere, VOCs undergo chemical reactions with ambient pollutants and solar radiation, significantly influencing tropospheric chemistry [22]. These reactions alter the concentrations of hydroxyl radicals (OH), contribute to the formation of secondary organic aerosols and organic acids, and facilitate the production of ozone through photochemical processes [23]. In areas with intense sunlight such as the western Mediterranean region, the interplay between biogenic and anthropogenic precursors can intensify ozone formation [24].

VOCs in urban environments originate from a combination of anthropogenic and biogenic sources, each contributing differently to the overall emission profile. Transportation is often the dominant source in high-traffic corridors, accounting for over 50% of total urban VOC emissions in several European cities, primarily due to fuel combustion and evaporative losses from petrol and diesel vehicles [25,26]. Common species emitted include benzene, toluene, ethylbenzene, xylenes (BTEX), ethene, and propene. Industrial sources contribute significantly to localised emissions, especially near refineries, manufacturing hubs, or chemical plants, where chlorinated hydrocarbons, alkanes, and aromatics are released during solvent use, degreasing, and chemical manufacturing [27]. Biogenic sources, particularly vegetation, emit isoprene and monoterpenes, which can be substantial during summer months and are highly reactive in ozone formation processes [28]. Although their contribution to total VOC levels is typically lower in urban cores, they can play a significant role in ozone production under VOC-limited regimes. Understanding the relative contributions from these sources is essential for accurately apportioning VOCs and designing effective emission control strategies. Studies frequently report strong correlations between VOCs such as benzene, ethylbenzene, toluene, ethene, and propene, suggesting common emission sources, especially vehicular exhaust and fossil fuel combustion [29,30].

Numerous studies have applied a range of statistical and analytical methods to understand the behaviour, sources, and impacts of VOCs and their role in ozone formation. Correlation and regression analyses are commonly used to identify interrelationships and common sources among VOC species [31]. Principal Component Analysis (PCA) and Positive Matrix Factorization (PMF) are widely used for source apportionment and dimensionality reduction in VOC datasets [32,33,34]. Diurnal and seasonal pattern analysis is frequently employed to link VOC variability with traffic patterns and boundary layer dynamics [35]. Additionally, clustering techniques such as K-means and hierarchical clustering have gained popularity for identifying pollution regimes and understanding atmospheric processing [36,37]. However, only a limited number of studies have combined long-term high-resolution datasets with unsupervised machine learning and lagged cross-correlation analyses to explore dynamic VOC–ozone interactions under diverse meteorological conditions, particularly in dense urban settings in the UK. This study bridges that gap through a novel integrative framework.

Ozone is a secondary pollutant formed through complex photochemical reactions involving VOCs and NO_X in the presence of sunlight [38]. The primary mechanism begins with the photolysis of NO₂:

{N O}_{2} + h v \to N O + O (^{3} P)

(R1)

O (^{3} P) + O_{2} + M \to O_{3} + M

(R2)

VOCs influence ozone formation by producing peroxy radicals (RO₂•) during their oxidation, which convert NO to NO₂ without consuming ozone:

{R O}_{2} • + N O \to R O • + {N O}_{2}

(R3)

This NO₂ can then photolyze again, producing more ozone. The efficiency of this cycle depends heavily on the VOC-to-NO_X ratio. In VOC-limited regimes (common in urban areas with high NO_X emissions), adding more VOCs increases ozone production, whereas reducing NO_X can initially lead to more ozone due to decreased titration [39]. In contrast, NO_X-limited regimes (often in rural or downwind regions with low NO_X), ozone formation is limited by NO_X availability, and VOC reductions have less impact [40]. VOC species also differ in their reactivity; alkenes and aromatics like isoprene and toluene form ozone more efficiently due to their high reactivity and radical generation potential under sunlight.

Environmental factors such as temperature, solar radiation, and boundary layer dynamics further influence these reactions. Elevated temperatures and intense sunlight accelerate VOC oxidation, enhance photolysis rates, and thus increase ozone formation. These mechanisms explain the observed inverse seasonal patterns in VOC and ozone concentrations and underscore the importance of considering chemical regimes when designing mitigation policies.

From a health perspective, long-term exposure to VOCs is increasingly associated with various adverse effects [41,42,43]. Due to their chemical reactivity, VOCs can cause toxic, allergic, mutagenic, and carcinogenic outcomes depending on exposure levels and durations [44,45]. For instance, Wang et al. (2025) analysed data from the U.S. National Health and Nutrition Examination Survey (NHANES 2011–2020) and found that elevated VOC biomarkers were significantly associated with increased cardiovascular risk indicators, including blood pressure and systemic inflammation markers [42]. Health risks also vary based on compound type, exposure environment, and individual susceptibility. Prolonged exposure, especially in indoor settings, has been linked to serious outcomes such as cancer [46]. Tsai (2019) reviewed VOCs regulated as indoor air pollutants and concluded that several, including benzene and trichloroethylene, are linked to leukaemia, liver toxicity, and neurobehavioral effects [46]. Notably, compounds like trichloroethylene, vinyl chloride, benzene, and formaldehyde are recognized for their high toxicity and carcinogenic potential [47,48]. McCarthy et al. (2006) assessed background concentrations of 18 air toxics, including benzene and formaldehyde, and reported elevated cancer risk levels associated with ambient exposure in urban North America [48]. Certain studies have connected domestic exposure, e.g., cooking fuels or poor ventilation, to elevated cancer risk, particularly among women and children [49,50]. Other research has highlighted links between VOCs and asthma exacerbation or cardiovascular issues [51]. In a meta-analysis, Alford and Kumar (2021) found consistent links between indoor VOC exposure and respiratory symptoms, including coughing, wheezing, and asthma onset in children and adults [12]. These findings reinforce the urgency of controlling VOC emissions in urban environments, both to meet air quality standards and to protect long-term public health. Despite their widespread presence, public awareness of VOC exposure remains limited due to their often-hidden presence in consumer products and indoor environments.

Beyond the air, VOCs are also detected in soil and water [52]. Groundwater contamination can occur through industrial spills or improper waste disposal. This contamination may pose health risks when groundwater is used as a drinking source [53]. Detecting VOCs in water bodies poses analytical challenges due to their volatile nature and the sensitivity required in sampling. Analytical methods include gas chromatography (GC), mass spectrometry (MS), and more advanced setups like purge-and-trap GC/MS (e.g., EPA Method 524.2), headspace solid-phase microextraction (HS-SPME), surface acoustic wave sensors (SAW), ion mobility spectrometry (IMS), and photoionization detection (PID) [54]. GC and GC/MS techniques are especially favoured due to their high accuracy and sensitivity. Given their pervasive nature, diverse sources, and complex behaviour, VOCs represent a critical concern for environmental monitoring and public health. Their impact spans across atmospheric chemistry, human health, and ecosystem integrity. Accurate estimation and prediction of VOC dispersion are crucial particularly in urban areas with dense human activity and overlapping sources. Dispersion modelling techniques, including atmospheric models, are vital for simulating how VOCs move through and react within the atmosphere. When combined with extensive datasets and modern tools such as machine learning, these models can yield deeper insights into VOC patterns, enhance forecasting capabilities, and inform policy decisions for effective air quality management.

Despite extensive work on VOC emissions and ozone formation, several critical gaps persist. Most studies focus on short-term monitoring campaigns or isolated pollutants, limiting our understanding of long-term dynamics and inter-species behaviour under varying meteorological conditions. Moreover, few investigations in UK urban environments have employed an integrated framework combining multivariate statistics, unsupervised machine learning (PCA and clustering), and time-lagged cross-correlation to disentangle the complex interactions between VOCs and ozone. This study addresses these gaps by analysing an 8-year high-resolution dataset from a key urban traffic corridor, applying novel analytical tools to uncover emission patterns, temporal behaviours, and photochemical regimes. Such an approach enhances our ability to inform policy and design targeted interventions for effective urban air quality management.

Hence, this research aims to systematically investigate the ambient behaviour, temporal dynamics, and source characteristics of key VOCs in an urban environment by integrating advanced statistical analyses, including correlation matrices, linear regression, and cross-correlation techniques, with meteorological and ozone data. The objectives include (i) identifying major VOC species contributing to ozone formation under VOC-limited regimes, (ii) characterizing temporal (diurnal, weekly, seasonal) variability of VOCs and ozone to infer patterns of emission and transformation, (iii) evaluating the spatial influence of emission sources using wind-sector analyses and polar plots, and (iv) applying principal component analysis and clustering methods to classify pollution regimes and understand underlying atmospheric processes. Through these analyses, the study seeks to provide insights into VOC–ozone interactions, highlight the significance of anthropogenic and meteorological drivers, and inform targeted mitigation strategies for air quality management in densely populated urban areas.

2. Methodology

2.1. Instrumentation and Data Collection

Data for this study were collected from the Marylebone Road supersite in central London, a well-established urban air quality monitoring location characterized by heavy traffic and diverse emission sources. Measurements focused on key VOCs including benzene, toluene, ethylbenzene, ethene, propene, isoprene, propane, and ethyne, alongside ozone and meteorological parameters. The measurement of VOCs, O₃, and meteorological parameters involved temporal resolution of 15 min over the period between 1 January 2015 and 1 January 2023. The primary instrument used for VOC detection was a Hewlett-Packard Gas Chromatograph with Flame Ionisation Detector (GC-FID), operated in compliance with the protocols of the UK Hydrocarbon Monitoring Network. This system provides high temporal resolution, sensitivity, and compound specificity for hydrocarbons. In addition, benzene (C₆H₆) was also measured independently using a Differential Optical Absorption Spectroscopy (DOAS) system for cross-validation and source attribution analysis. O₃ concentrations were recorded via a UV photometric O₃ monitor, which meets the EU reference method standards for ambient O₃.

The meteorological data were concurrently measured to account for the local atmospheric dynamics influencing VOC variability. These parameters included ambient temperature (°C), relative humidity (%), atmospheric pressure (hPa), wind speed (m/s), wind direction (degrees from North), global solar radiation (W/m²), and precipitation (mm), captured via an on-site meteorological station equipped with standard meteorological sensors.

2.2. Supplementary Regional Meteorological Data

To overcome the micro-scale limitations of the street canyon and better reflect synoptic meteorological trends, data were obtained from a regional urban background site (Station: 51.505° N, 0.055° W). Hourly air temperature, wind speed, wind direction, and atmospheric pressure data were retrieved from NOAA’s Integrated Surface Database using the worldmet package in R. These additional datasets were used to support the analysis of large-scale air mass transport and to verify observed patterns at the roadside site.

2.3. Data Preprocessing and Quality Control

All pollutant and meteorological data underwent a rigorous quality control (QC) protocol: instrument error flags, calibration periods, and invalid readings were removed. Data were checked for continuity and synchronized across all-time series. Minor missing values (<1%) were interpolated. VOC concentrations were log-transformed (where necessary) to normalize skewed distributions. All measurements were aggregated to hourly means for consistency and computational efficiency. Variables were standardized (z-score normalization) prior to statistical and machine learning analyses to allow fair comparison between differing units and scales.

2.4. Temporal and Statistical Analyses

Temporal behaviour of VOCs and O₃ was analysed using diurnal, weekly, and seasonal cycle plots, Spearman correlation matrices, and CCF analysis to explore time-lagged interactions between VOCs and O₃. Linear regression models were used to assess co-variation among selected VOCs and infer shared source categories. Furthermore, wind sector and polar plot analyses were used to explore spatial influences and directional trends in pollutant concentrations, particularly from key sectors such as SW, SSW, and WSW, which are indicative of traffic and industrial source areas.

2.5. Principal Component Analysis (PCA)

To reduce dimensionality and uncover latent pollutant patterns, Principal Component Analysis (PCA) was conducted on the standardized dataset. Variables included all eight VOC species, ozone, and meteorological parameters (temperature, wind speed, wind direction). The first two principal components accounted for the majority of the total variance (PC1 ≈ 40%, PC2 ≈ 15%), where PC1 captured vehicular and combustion-related VOCs (e.g., benzene, toluene, ethene, propene) while PC2 reflected biogenic and temperature-driven influences (e.g., isoprene and temperature). Higher-order PCs highlighted meteorological dispersion effects (e.g., wind speed/direction). The PCA output was interpreted through scree plots, loading scores, and biplots to distinguish between source profiles and environmental drivers.

2.6. K-Means Clustering Analysis

A K-means clustering algorithm was applied to the PCA-transformed data (first six PCs retained) to categorize distinct atmospheric regimes. Due to computational constraints, a random subset of 500 hourly observations was clustered initially, with cluster labels mapped back to the full dataset using proximity-based classification. The optimal number of clusters (k = 3) was chosen based on the elbow method applied to within-cluster sum of squares (WSS). The resulting clusters were interpreted as follows: Cluster 1: high VOCs, low O₃—fresh primary emissions under stagnant, cool conditions. Cluster 2: moderate VOCs and O₃—transitional regime indicating partial photochemical processing. Cluster 3: low VOCs, high O₃—aged air masses where VOCs have reacted, leading to secondary ozone accumulation under warm, windy conditions. Cluster profiles were visualized using spider (radar) plots, allowing intuitive comparison of pollutant and meteorological fingerprints across regimes.

3. Results and Discussion

3.1. Correlation Analysis of VOCs, O₃, and Meteorological Variables

The Spearman correlation analysis elucidated the relationships among VOC species and O₃ (as shown in Figure 1). Strong positive correlations were observed among benzene, ethylbenzene (EBenzene), toluene, ethene, and propene, with correlation coefficients exceeding 0.6 in many cases, indicating a shared source, likely from vehicular emissions and fossil fuel combustion.

Figure 1. Spearman correlation matrix of VOCs, O₃, and meteorological variables.

Notably, ethene and propene showed particularly strong associations (r ≈ 0.68), reinforcing their co-emission from anthropogenic activities. In contrast, ozone exhibited moderate to weak negative correlations with most VOCs (e.g., −0.56 with benzene, −0.54 with ethylbenzene, −0.56 with toluene), suggesting that higher VOC concentrations are often associated with lower ozone levels at the measurement timescale, likely due to VOC-limited ozone formation regimes typical in urban environments. These findings imply that while VOCs contribute to ozone production through photochemical reactions, the presence of high VOC levels might simultaneously reflect periods of less-efficient ozone formation, possibly influenced by titration effects with NO. Overall, the correlation patterns highlight the intertwined dynamics of primary emissions and secondary pollutant formation in the studied environment.

3.2. Linear Regression Analysis of Concerned VOCs

The linear regression analysis between benzene (C₆H₆) and ethylbenzene (C₆H₅C₂H₅) at Marylebone Road reveals a strong positive relationship (as shown in Figure 2a), with a slope of 1.26, indicating that for every 1 μg/m³ increase in ethylbenzene concentration, benzene concentration increases by approximately 1.26 μg/m³, and an R² value of 0.75, indicating that 75% of the variability in benzene can be explained by ethylbenzene levels. The positive correlation suggests that both compounds share a common emission source, likely vehicular exhaust and fossil fuel combustion. The observed intercept of 0.24 implies a baseline concentration of benzene even when ethylbenzene is low, possibly from background sources or photochemical reactions. In urban environments, benzene and ethylbenzene are known to undergo photochemical oxidation, leading to the formation of O₃ via reactions with OH and NO_X. The reaction for VOCs like benzene and ethylbenzene is as follows:

V O C + O H \to P e r o x y r a d i c a l ({R O}_{2})

(R4)

{R O}_{2} + N O \to O_{3}

(R5)

Figure 2. Hexbin plots of concerned VOCs, establishing their correlations (μg/m³) and behaviours, where (a) correlation plot of Benzene vs. EBenzene, (b) correlation plot of Toluene vs. EBenzene, (c) correlation plot of Benzene vs. Propene, and (d) correlation plot of Ethene vs. Propene.

These reactions contribute to ozone formation in the presence of abundant VOCs and sunlight, though high NO_X concentrations can suppress ozone production via the titration effect, where NO reacts with ozone:

N O + O_{3} \to {N O}_{2} + O_{2}

(R6)

The negative correlation between VOCs and ozone observed in some studies reflects this dynamic, where high VOC levels coincide with reduced ozone formation under high NO_X conditions. Similar correlations between benzene and ethylbenzene have been observed in other urban studies, such as those by researchers [55,56] in Beijing, highlighting the significant role of vehicular emissions in contributing to urban air pollution. These findings underline the environmental and health risks posed by these toxic VOCs, which are associated with increased risks of leukaemia (for benzene) and neurotoxicity (for ethylbenzene). Reducing their levels would require addressing emissions from traffic and promoting cleaner, low-emission technologies.

The linear regression analysis between toluene and ethylbenzene (as shown in Figure 2b) reveals a moderate positive association, described by the equation Y = 1.06 + 2.8X with an R² value of 0.418. This indicates that while both pollutants share common sources, primarily vehicular exhaust, fuel evaporation, and industrial solvent use, their emissions and atmospheric behaviour are not entirely synchronized. The relatively steep slope suggests that toluene concentrations rise more rapidly than ethylbenzene, and the non-zero intercept (1.06 μg/m³) implies a persistent background level of toluene, possibly due to additional inputs from commercial and industrial solvent applications or more localized emissions. Though the key photochemical reactions have been previously discussed, it is important to note that toluene, like ethylbenzene, undergoes OH-initiated oxidation, forming peroxy radicals, contributing to ozone formation under suitable sunlight and NO_X conditions. The moderate correlation may also reflect differences in atmospheric lifetimes, reactivities, or proximity to emission sources. Monod et al. (2001) found strong toluene ethylbenzene correlations (R² ≈ 0.94) in traffic-related samples, but weaker correlations in urban background air, due to additional sources of toluene (e.g., solvents, paint, industrial use) [57].

The current study’s moderate R² value (0.418) suggests a similar pattern, indicating partially shared sources but also the influence of diverse urban emission sources, which is consistent with their findings in mixed-source environments. Kheirbek et al. (2012) observed that traffic density and industrial activities influenced both toluene and ethylbenzene concentrations, again implying shared but not identical emission origins, supportive of this study’s regression results, where the association is moderate but not strong [58]. Similar source-divergent patterns in aromatic VOCs have been observed in other urban settings. For example, Na et al. (2005) [59] found variable contributions of mobile and evaporative sources to aromatic VOC levels in Seoul, while Mandal et al. (2023) [60] reported distinct diurnal and seasonal VOC trends in Delhi tied to traffic intensity and industrial activities. These studies reinforce the interpretation that aromatic VOCs in urban corridors like Marylebone Road arise from a complex mix of emissions, atmospheric processes, and chemical transformations, necessitating compound-specific mitigation strategies.

The regression analysis between benzene and propene concentrations (as shown in Figure 2c) gave the equation Y = −0.07 + 0.96X with an R² value of 0.433, indicating a moderate positive correlation. This suggests a partial overlap in their emission sources, predominantly vehicular exhaust and combustion of fossil fuels, both of which are known to emit aromatic hydrocarbons (e.g., benzene) and light alkenes (e.g., propene). The near-unity slope (0.96) indicates a proportional relationship between their concentrations, while the negative intercept may reflect instrument detection limits or background variability at low propene levels. From an atmospheric chemistry perspective, benzene is relatively chemically stable, with an atmospheric lifetime of several days, whereas propene is much more reactive due to its carbon–carbon double bond, undergoing rapid oxidation via hydroxyl radicals (OH) and contributing to tropospheric O₃ and peroxyacetyl nitrate (PAN) formation. Despite their co-emission, propene’s faster photochemical degradation compared to benzene may account for the moderate R², rather than a stronger association. Similar patterns have been observed in other urban environments. For instance, Ait-Helal et al. (2014) conducted a study in suburban Paris and reported that while benzene and propene are both emitted from traffic-related sources, their ambient concentrations and correlations are influenced by seasonal variations and atmospheric processing [61]. The study highlighted that propene levels exhibited significant diurnal and seasonal variability due to its higher reactivity, whereas benzene showed more stable concentrations. This differential behaviour underscores the complexity of VOC dynamics in urban atmospheres and the importance of considering both emission sources and atmospheric chemistry when interpreting pollutant relationships.

The regression analysis between ethene and propene concentrations (as shown in Figure 2d) yielded the equation Y = 0.13 + 2.02X with an R² value of 0.53, indicating a moderate positive correlation. This suggests that while ethene and propene share common emission sources, such as vehicular exhaust and fossil fuel combustion, their atmospheric behaviours and reactivities differ. Both compounds are reactive alkenes that play significant roles in urban photochemistry, particularly in the formation of tropospheric ozone and secondary organic aerosols. Their atmospheric lifetimes are relatively short due to rapid reactions with hydroxyl radicals, leading to the production of formaldehyde (CH₂O) and other photochemical oxidants. The observed moderate correlation may reflect the influence of varying emission strengths, atmospheric processing, and differing reactivities under urban conditions [62].

3.3. Temporal Dynamics of VOC–Ozone Interactions via Cross-Correlation Analysis

The Cross-Correlation Function (CCF) analysis helps identify the time-lagged relationships between VOCs and ozone concentrations, revealing whether changes in VOC levels precede or follow changes in ozone. This is crucial for understanding the temporal dynamics of ozone formation driven by VOC emissions under photochemical conditions. The correlations are computed over lags from −24 to +24 h, where a positive lag indicates that the VOC leads O₃.

The CCF analysis between benzene and ozone (as shown in Figure 3a) reveals a strong and consistent negative relationship across all time lags. The peak negative correlation occurs at lag 0 (−0.3749), suggesting a contemporaneous inverse relationship where higher benzene concentrations are associated with lower ozone levels. This trend extends over a wide temporal window, with notable negative correlations at lag −1 (−0.3540), lag −2 (−0.3261), and lag −3 (−0.2987), indicating that benzene levels preceding ozone are also inversely related. The strength of the negative correlation gradually diminishes in positive lags (VOC leading ozone), but the relationship remains negative throughout, with values like lag 1 (−0.3602), lag 2 (−0.3346), lag 5 (−0.2668), and up to lag 24 (−0.2108). This sustained pattern indicates a strong and persistent inverse association, suggesting that benzene does not play a direct ozone-forming role in this setting and might act more as a sink or reactant that consumes oxidants rather than promoting ozone buildup. For instance, Sharma et al. (2021) reported a moderate negative correlation between benzene and ozone concentrations, with correlation coefficients of r² = 0.475 at DMS and r² = 0.356 at NSIT, indicating that higher benzene levels are associated with lower ozone concentrations [63]. The study also highlighted that benzene concentrations are influenced by meteorological parameters, which in turn affect ozone formation.

Figure 3. Cross-correlation between VOC and O₃ concentrations. The x-axis shows lag values (negative: O₃ leads, positive: VOC leads) and the y-axis shows the cross-correlation coefficient (ACF). Peaks indicate the strength and direction of the relationship at each lag. Dashed lines represent 95% confidence limits. The cross correlations shown in this figure are (a) Benzene vs. O₃, (b) Isoprene vs. O₃, (c) Propene vs. O₃, (d) Ethene vs. O₃, and (e) Toluene vs. O₃.

For isoprene and O₃ (as shown in Figure 3b), the CCF analysis revealed a peak positive correlation between isoprene and ozone concentrations at lags +19 to +21 h, with a maximum correlation coefficient of approximately 0.11. This gradual increase from lag −20 to 0, peaking around lag −5 to +5 and stabilizing until lag +21, suggests that isoprene emissions may precede ozone formation, albeit with a weak relationship. Isoprene’s role in ozone formation is likely secondary or dependent on other atmospheric conditions, such as the presence of nitrogen oxides (NO_X) and sunlight. Studies have shown that isoprene oxidation contributes to ozone production, particularly under moderate NO_X conditions, with the rate of ozone formation being influenced by NO_X levels and solar radiation intensity [64].

For propene and O₃ (as shown in Figure 3c), the peak negative correlation was at lag 0 (−0.4958), and the values remained strongly negative at lags −1 to −5, with correlations ranging from −0.4655 to −0.3286. The strength of the negative correlation decreased gradually as the lag moved positively but still remained substantial. For instance, at lags 1 to 5, the correlations were −0.4775, −0.4422, −0.4046, −0.3694, and −0.3435, respectively. Even at longer positive lags, such as lag 24, the value was still negative at −0.2586, indicating a sustained inverse association between propene and ozone levels over time. The immediate negative correlation suggests quick reactivity and possibly a precursor role in photochemical ozone production. Propene’s rapid reaction with ozone and its role in forming secondary organic aerosols have been documented, highlighting its significance in atmospheric processes [65].

A strong negative correlation was observed between ethene and ozone concentrations (as shown in Figure 3d) at lag 0, with a correlation coefficient of −0.5067. This indicates that high ethene levels coincide with lower ozone concentrations, and as ethene levels drop, ozone tends to rise. This inverse relationship suggests rapid reactivity, where ethene is consumed in ozone-producing reactions. Ethene reacts readily with ozone, leading to the formation of various products, and this reaction plays a significant role in atmospheric chemistry [66].

The toluene vs. O₃ relationship (as shown in Figure 3e) also demonstrated consistent negative correlations across the entire lag period. The most negative value appeared at lag 0 with a correlation of −0.4942. High negative correlations were observed at lag −1 (−0.4614), lag −2 (−0.4194), and lag −3 (−0.3791). Positive lags exhibited slightly reduced but still negative correlations, such as lag 1 (−0.4755), lag 2 (−0.4379), lag 3 (−0.3952), and lag 4 (−0.3574). The correlation gradually weakened over time, with lag 24 showing a value of −0.2342. Although the strength of correlation declined across positive lags, the overall trend remained negatively inclined throughout the range. This inverse relationship indicates potential ozone-forming potential through photochemical oxidation, with toluene being depleted as ozone builds up. Toluene’s photochemical reactions with oxygen atoms lead to the formation of various products, contributing to ozone formation in the atmosphere [67].

3.4. Temporal (Diurnal, Monthly, Weekly) Variability of VOCs and O₃

Hourly trends for five VOCs (benzene, Ebenzene, ethane, ethene, ethyne) reflect a pronounced diurnal cycle that aligns closely with anthropogenic activity patterns and atmospheric boundary layer (ABL) dynamics (as shown in Figure 4). Benzene and Ebenzene concentrations exhibit a clear bimodal distribution. Concentrations rise sharply in the early morning hours, peaking between 07:00 and 10:00, which coincides with morning rush hour traffic and a shallow boundary layer that inhibits vertical dispersion. For instance, benzene levels increase from around 0.74 µg/m³ at 06:00 to over 1.09 µg/m³ by 09:00. After midday, concentrations decline due to enhanced vertical mixing and photochemical degradation under increased solar radiation. A second, smaller peak occurs in the late afternoon to early evening, typically from 17:00 to 21:00, likely reflecting evening vehicular activity and a lowering ABL. Ethane, while still showing a bimodal profile, demonstrates a relatively stable concentration throughout the day, owing to its low reactivity and longer atmospheric lifetime. This suggests a combination of local and regional sources, including fossil fuel combustion and long-range transport. Conversely, ethene and ethyne exhibit sharper morning peaks and steeper declines in the afternoon, attributable to their higher reactivity with hydroxyl radicals and short atmospheric lifetimes. These compounds are strongly associated with fresh vehicular and industrial emissions, and their reduction throughout the day supports their rapid oxidative loss [68,69].

Figure 4. Variability of benzene, Ebenzene, ethane, ethene, and ethyne concentrations.

Monthly variations in VOC concentrations reveal a distinct seasonal cycle. Benzene and Ebenzene concentrations peak in winter (January–February), with benzene reaching as high as 1.34 µg/m³ in January. Levels gradually decline towards the summer months, reaching a minimum between May and July (~0.74–0.83 µg/m³). This trend reflects a combination of factors: in winter, reduced photochemical activity limits the atmospheric degradation of VOCs, and lower mixing heights lead to pollutant accumulation near the ground. Moreover, cold-start vehicle emissions and increased heating-related combustion during winter months further exacerbate ambient concentrations [70]. In contrast, summer months are characterised by enhanced photochemical activity, which facilitates the oxidation and removal of reactive VOCs. Additionally, greater ABL height and stronger atmospheric mixing reduce surface concentrations. Ethane displays a comparatively flatter seasonal profile, consistent with its chemical stability and partial contribution from background sources. Ethene and ethyne follow a similar pattern to benzene, exhibiting winter maxima and summer minima, again attributable to seasonal differences in atmospheric oxidation capacity and boundary layer conditions [71].

These findings are consistent with previous studies across urban European environments, which have documented wintertime accumulation of VOCs due to low dispersion and limited photochemical degradation, alongside morning and evening peaks driven by local traffic emissions [68,69,70,71]. The VOC behaviour observed in current study is characteristic of heavily trafficked urban areas, reinforcing the significance of vehicular emissions and atmospheric processes in shaping VOC exposure patterns.

The further analysis for VOCs (propene, propane, isoprene, toluene) and O₃ is shown in Figure 5. The propane levels exhibit a clear morning peak beginning at around 6 AM, reaching maximum concentrations between 8–10 AM (~1.18 μg/m³), which coincides with traffic rush hours. These levels decline through the day, hitting their lowest concentrations between 3–5 AM (~0.93 μg/m³). Similarly, isoprene shows a sharp mid-morning peak (8–10 AM) of around 1.12 μg/m³, after starting the day at significantly lower levels (~0.53–0.63 μg/m³). Toluene behaves in much the same way, peaking between 8–10 AM (1.17–1.20 μg/m³) and reaching its lowest levels from 3–5 AM (~0.51 μg/m³). The behaviour of O₃ contrasts with the VOCs. O₃ reaches its maximum values during nighttime, around 2–4 AM (~1.24 μg/m³) but dips sharply between 8–9 AM (~0.695–0.735 μg/m³). This pattern aligns with titration of O₃ by NO during morning traffic peaks, where freshly emitted NO from vehicle exhaust reacts with ambient ozone, a well-documented mechanism in urban air chemistry [72].

Figure 5. Variability of propene, propane, isoprene, toluene, and O₃ concentrations.

Seasonally, propane concentrations peak during winter, particularly in January (~1.36 μg/m³), due to reduced atmospheric dispersion and increased heating-related emissions. A marked dip occurs from April to July (0.83–0.85 μg/m³), likely reflecting enhanced photochemical degradation and atmospheric mixing. A similar pattern is seen with propene, which also shows winter highs (January and October, ~1.07–1.19 μg/m³) and spring/summer lows (April–July, ~0.84–0.86 μg/m³). While isoprene and toluene follow comparable seasonal cycles, O₃ concentrations tend to vary more complexly, influenced by both precursor availability and solar radiation that drives photochemical ozone formation.

These findings echo previously reported trends in urban atmospheric studies. For example, Monks et al. (2009) highlight how VOCs such as toluene and propane show morning peaks aligned with traffic activity, while ozone exhibits early morning minima due to titration [73]. Furthermore, von Schneidemesser et al. (2010) observed elevated wintertime VOC concentrations across European cities, attributed to both anthropogenic activity and meteorological stagnation [74]. Hence, the observations highlight clear diurnal and seasonal recurrence in urban pollutant behaviour. Morning VOC peaks coincide with traffic emissions, while O₃ dips during high-NO_x periods underscore the importance of titration processes. The seasonal rise of VOCs in winter and photochemical O₃ variations in response to solar input and precursor levels are consistent with known atmospheric chemistry patterns. These recurrences are not only expected but have been systematically documented across global urban environments, reaffirming the importance of historical pattern recognition in environmental modelling.

3.5. Wind Sector Analysis of VOCs and Ozone Concentrations

The VOCs, including benzene, ethylbenzene, ethene, ethyne, isoprene, propane, propene, and toluene, exhibit similar wind direction-based concentration patterns, particularly influenced by wind sectors from the southwest (SW), west-southwest (WSW), and south-southwest (SSW), as shown in Figure 6. These directions consistently correspond to the highest average concentrations of pollutants, suggesting that the major emission sources are localized toward the southwest of the monitoring site, possibly from traffic corridors, industrial operations, and fuel-handling activities typical of urban and semi-industrialized areas.

Figure 6. Polar plot analysis of VOC and ozone concentrations by wind direction at the monitoring site.

Benzene and ethylbenzene both exhibit the highest concentrations (the data of concentrations has been shared in the Appendix A, Table A1, Table A2, Table A3, Table A4, Table A5, Table A6, Table A7 and Table A8) in the SSW, SW, and WSW sectors (benzene: SSW: 1.01 μg/m³, SW: 0.850 μg/m³, WSW: 0.841 μg/m³; ethylbenzene: SW: 0.462 μg/m³, WSW: 0.437 μg/m³, SSW: 0.428 μg/m³), suggesting similar source regions related to vehicular emissions and industrial activities, as these compounds are often associated with combustion processes and solvent use [75]. The proximity of the monitoring site to major transportation routes and industrial zones likely influences these results, reinforcing the hypothesis of localized emission hotspots along these wind paths.

Similarly, ethene and propane show peak concentrations in the SSW, SW, and WSW sectors (ethene: SSW: 2.29 μg/m³, WSW: 2.14 μg/m³, SW: 2.06 μg/m³; propane: SW: 6.57 μg/m³, WSW: 6.52 μg/m³, SSW: 5.59 μg/m³), further corroborating the notion of localized pollution sources in the southwestern direction. Ethene, a byproduct of fossil fuel combustion, and propane, often associated with industrial operations and heating, exhibit a strikingly similar distribution. This similarity can be attributed to the shared emission sources, such as traffic and industrial zones, which are dominant in the southwest direction.

On the other hand, isoprene and toluene show some variations in their directional concentration patterns. Isoprene is predominantly emitted from vegetation and combustion sources, with its highest concentrations recorded in the SSW, S, and SSE wind sectors (isoprene: SSW: 0.0530 μg/m³, S: 0.0528 μg/m³, SSE: 0.0511 μg/m³). This could be indicative of biogenic emissions from nearby green spaces or vegetation in addition to traffic emissions. The observed patterns for isoprene may reflect mixed sources, with a combination of biogenic and anthropogenic contributions, as is common in urban areas with nearby natural green cover [76]. In contrast, toluene, typically associated with industrial solvents and vehicle emissions, follows a pattern similar to that of benzene and ethylbenzene, with the highest concentrations found in the SW, WSW, and SSW sectors, confirming the dominance of vehicular and industrial sources.

Ethyne (acetylene), however, exhibits a distinct trend, with the highest concentrations observed in the E and ENE wind sectors (ethyne: E: 2.53 μg/m³, ENE: 2.29 μg/m³), pointing to an emission source located to the east or northeast of the monitoring site. This directional anomaly could indicate emissions from nearby industrial zones or regional pollution sources eastward of the site. Ethyne, being a byproduct of incomplete combustion, is also commonly associated with industrial activities and regional transport emissions, and this observation may highlight the influence of larger, more distant sources or regional transport patterns.

Finally, the analysis of O₃ concentrations reveals a strong correlation with wind directions from the SSE, SE, and S sectors, where the highest average concentrations (55.3 μg/m³, 50.4 μg/m³, and 44.7 μg/m³, respectively) are observed. Ozone, a secondary pollutant formed by photochemical reactions between VOCs and NO_X under sunny conditions, tends to accumulate in regions with higher levels of precursor pollutants, which aligns with the high levels of VOCs in these sectors. The SSE and SE wind patterns may carry precursor pollutants from nearby traffic and industrial zones, promoting ozone formation in these regions. This directional trend supports the hypothesis that O₃ formation is influenced by local pollution sources, such as vehicular emissions, and can be enhanced by meteorological factors such as sunlight and temperature.

3.6. Principal Component and Clustering Analysis of VOCs and Meteorological Variables in Urban Air Quality

3.6.1. Principal Component Analysis (PCA)

PCA was performed on the standardized VOC dataset to reduce dimensionality and identify key patterns as shown in the scree plot in Figure 7. The first principal component (PC1) accounted for 39.79% of the total variance, while PC2 explained an additional 14.83%, leading to a cumulative variance of 54.63% by the second dimension. PC3 to PC6 contributed 10.25%, 8.59%, 6.97%, and 6.11%, respectively, cumulatively capturing 86.54% of the total variance. Beyond PC6, each additional component explained less than 5% of the variance, indicating diminishing returns. Based on the cumulative variance and the observed elbow point in the scree plot, retaining the first six principal components (PC1 to PC6) was considered sufficient for subsequent clustering and analysis.

Figure 7. Scree plot of principal components (data added in the Appendix B).

The PC1 accounted predominantly for the variance associated with the VOC species such as ethene (15.85%), propene (15.72%), benzene (15.68%), Ebenzene (14.54%), and toluene (13.97%). The second principal component (PC2) was driven mainly by isoprene (36.57%) and air temperature (19.73%), indicating a strong influence of biogenic emissions and temperature-driven variability. The third principal component (PC3) was characterized by high contributions from wind direction (49.04%) and wind speed (28.71%), suggesting the importance of atmospheric dispersion processes. Similarly, PC4 was largely influenced by ethyne (43.80%) and air temperature (27.34%). These results indicate that both anthropogenic emissions (VOC-related) and meteorological factors (temperature, wind) play crucial roles in the observed variations in pollutant concentrations.

The combined interpretation of the scree plot and the variables’ contribution to PC1 and PC2 highlights the major environmental processes shaping VOC variability at the study site. The variables contributing to the PC1 and PC2 are shown in Figure 8. The strong loading of traffic-related VOCs such as ethene, propene, benzene, Ebenzene, and toluene on PC1 reflects the dominant influence of primary anthropogenic emissions, primarily from vehicular and combustion sources. Meanwhile, the high contribution of isoprene and air temperature to PC2 signifies the role of biogenic activities and meteorologically driven processes, where warmer temperatures enhance natural VOC emissions. The third and fourth principal components, shaped by wind-related parameters (wind direction, wind speed) and specific VOCs like ethyne, further emphasize the significance of atmospheric dispersion and transport in modulating local pollutant concentrations. To summarise, together, the environmental interpretation suggests that the urban air composition is shaped by two main processes: (1) primary anthropogenic emissions (captured in PC1) heavily driven by traffic-related VOCs, and (2) secondary biogenic and photochemical processes (captured in PC2) influenced by natural emissions and temperature. The choice to focus on PC1 and PC2 in the variable contribution plot is justified, as these two dimensions together explain more than half of the total variance, offering the clearest insights into the dominant environmental processes affecting air quality at the study site. By identifying these dominant patterns, PCA not only simplifies the complex dataset but also provides a scientific basis for targeted air pollution control strategies, distinguishing between traffic management interventions and temperature- or wind-related considerations.

Figure 8. Variables contribution to PC1 and PC2 (data added in the Appendix B, Table A9, Table A10 and Table A11).

3.6.2. K-Means Clustering Analysis

To classify distinct air quality regimes based on VOCs, O₃, and meteorological variables, K-means clustering was applied to the normalized (scaled) dataset. The variables considered included benzene, Ebenzene, ethene, ethyne, isoprene, propane, propene, toluene, O₃, air temperature, wind speed, and wind direction. Given the computational limitations encountered during clustering of the full dataset, a random sample of 500 points was utilized for cluster assignment. Following this, cluster memberships were mapped back onto the full cleaned dataset for interpretation. The optimal number of clusters (k = 3) was determined based on visual inspection of within-cluster sum of squares (WSS) plots and empirical observations of the data structure.

The cluster characteristics were visualized using a spider plot (radar plot) as shown in Figure 9, where each axis represents one of the standardized variables scaled between 0 and 1. In the spider plot, each coloured polygon corresponds to the average profile of one cluster across all variables, allowing intuitive visual comparison. Cluster 1 exhibited the highest normalized values for benzene, Ebenzene, ethene, ethyne, propane, propene, and toluene (all at or near 1.0 on the standardized scale) but had very low O₃ concentrations (scaled 0.0) and lower air temperatures and wind speeds. This indicates a pollution regime characterized by fresh, primary VOC emissions under relatively stagnant and cooler atmospheric conditions. Cluster 2 showed intermediate VOC concentrations (scaled around 0.3–0.4) and moderately elevated O₃ levels (scaled around 0.17) with slightly higher air temperatures and wind speeds compared to Cluster 1, suggesting transitional or mixed air masses where some photochemical processing of VOCs had occurred. Cluster 3, by contrast, showed the lowest VOC concentrations (scaled near 0.0 for most VOCs) but the highest O₃ levels (scaled 1.0), highest air temperature (1.0), and highest wind speed (1.0). This profile reflects aged air masses where primary VOCs have largely reacted, resulting in elevated secondary pollutants like ozone under warmer, sunnier, and windier conditions.

Figure 9. Radar (spider) plot showing normalized mean values of VOCs, O₃, temperature, wind speed, and direction across three K-means clusters, highlighting distinct pollution and meteorological patterns.

The K-means clustering identified three distinct pollution regimes, each representing characteristic atmospheric and emission scenarios relevant to urban air quality management: Cluster 1 was characterised by high concentrations of VOCs and very low ozone levels. This scenario is indicative of fresh primary emissions dominated by traffic and combustion sources, under stagnant meteorological conditions (low temperature and low wind speed). These conditions limit dispersion and inhibit photochemical activity, leading to pollutant accumulation. From a policy perspective, this scenario highlights the need for stricter traffic emission controls during early morning and winter periods when dispersion is weakest. Cluster 2 represented transitional or mixed scenarios with moderate VOC and ozone levels. This cluster occurred under slightly warmer and windier conditions, suggesting a blend of primary emissions and early-stage photochemical activity. Such conditions are common during late morning and shoulder seasons (spring/autumn). Interventions during these periods should focus on both emission reduction and photochemical monitoring, as this scenario can rapidly evolve toward secondary pollution episodes. Cluster 3 featured low VOCs but elevated ozone, occurring during the warmest and windiest periods. These conditions favour photochemical processing and reflect aged urban air masses. This scenario exemplifies VOC-limited ozone formation, where even modest VOC levels lead to high ozone production due to abundant NO_X and strong solar radiation. Effective mitigation under these conditions requires prioritising VOC reduction, particularly of highly reactive species (e.g., aromatics and alkenes), while also considering regional transport contributions. These clusters provide a practical framework for dynamic air quality management. Instead of uniform policies, pollution mitigation can be tailored by time of day, season, and prevailing meteorology. For example, targeted restrictions on traffic emissions during Cluster 1 scenarios and VOC-specific industrial controls during Cluster 3 events could significantly reduce health risks and exceedances of regulatory thresholds.

The use of K-means clustering coupled with spider plot visualization allows identification of different atmospheric regimes based on pollutant and meteorological profiles. Scientifically, this aligns with the known behaviour of photochemical pollution: VOCs serve as precursors that, under sufficient solar radiation and in the presence of NOx, lead to secondary ozone formation. The observed inverse relationship between VOC concentrations and ozone levels across the clusters is consistent with classical photochemical smog theories [77]. Similar multi-cluster patterns have been reported in previous air quality studies where fresh emissions dominated low-ozone clusters, and aged, oxidized air masses showed elevated ozone [78,79]. Hence, this analysis reveals a clear separation of pollution regimes, highlighting the transition from fresh emission events (Cluster 1), through mixed conditions (Cluster 2), to photochemically aged air masses rich in secondary pollutants (Cluster 3). The clustering approach thus provides valuable insights into VOC dynamics, atmospheric aging, and secondary pollutant formation, offering a powerful method for source attribution and air quality management strategies.

While the patterns identified in this study are robust for the Marylebone Road corridor, it is important to note that emission profiles, meteorological influences, and chemical regimes may vary across urban settings. Therefore, applying this methodology to other cities would provide valuable comparative insights and test the generalisability of the observed pollution regimes.

4. Conclusions

This study comprehensively examined the behaviour, sources, and atmospheric interactions of key volatile organic compounds (VOCs) and ozone over an 8-year period along Marylebone Road, London, an urban corridor dominated by traffic emissions. The integration of multivariate statistics, machine learning, and meteorological analysis allowed us to identify mechanistic insights into pollutant formation and transformation dynamics, rather than presenting isolated statistical associations.

The key findings aligned with the study’s objectives. Firstly, strong correlations among VOCs such as benzene, toluene, ethylbenzene, and ethene confirmed their primary origin in vehicular and fossil fuel combustion emissions. Secondly, time-lagged inverse correlations with ozone, revealed through cross-correlation analysis demonstrated the operation of a VOC-limited regime. This regime is characterised by ozone titration due to high NO levels, delaying photochemical ozone build-up until pollutants have aged and dispersed. Thirdly, temporal trends showed that VOC concentrations peaked during winter and morning/evening rush hours, reflecting reduced dispersion and increased emissions, while ozone peaked in summer under conditions of strong solar radiation and greater atmospheric mixing, supporting its secondary formation pathway. Wind-sector analysis further revealed spatial heterogeneity in pollutant sources, with VOCs transported from southwest traffic corridors and ozone peaking under southeast winds carrying photochemically aged air masses.

Principal Component Analysis attributed nearly 40% of VOC variability to traffic emissions (PC1) and ~15% to biogenic and temperature-sensitive emissions (PC2). K-means clustering further identified three pollution regimes: fresh emission events with high VOCs and low ozone, mixed regimes with partial transformation, and aged air masses with low VOCs and elevated ozone, indicating progressive chemical evolution under meteorologically favourable conditions. These insights advance our understanding of how primary emissions interact with meteorology to shape secondary pollution outcomes in dense urban environments. By revealing when, where, and how VOCs contribute to or inhibit ozone formation, the study supports the development of targeted air quality interventions, particularly under VOC-limited regimes, where reducing NO_X alone may not be effective. The clustering results not only revealed the chemical and meteorological characteristics of each pollution scenario but also provided actionable insights for policy design. By recognising when specific emission sources and atmospheric conditions dominate, urban planners and regulators can implement more responsive, scenario-specific strategies, such as temporal traffic restrictions, VOC monitoring during high-ozone periods, and regional coordination for pollution transport. The study limitations include the use of a single roadside monitoring location, which may not capture the full spatial variability of emissions and ozone dynamics across the urban area. Additionally, while the study applied advanced statistical tools, chemical transport modelling was not used to simulate reaction pathways or regional transport explicitly. Future work could build on these findings by incorporating real-time chemical modelling, expanding spatial coverage, and evaluating health exposure impacts to inform policy decisions more comprehensively.

Author Contributions

Conceptualization, K.P.W., B.V.S.C., A.S., M.J.B. and K.L.S.; methodology, K.P.W., B.V.S.C. and A.S.; Validation, K.P.W., A.S., M.J.B. and K.L.S.; formal analysis, B.V.S.C. and K.P.W.; investigation, B.V.S.C. and K.P.W.; resources, B.V.S.C., M.J.B. and K.L.S.; data curation, K.P.W. and B.V.S.C.; writing—original draft preparation, B.V.S.C., K.P.W., A.S., M.J.B. and K.L.S.; writing—review and editing, B.V.S.C., K.P.W., A.S., M.J.B. and K.L.S.; visualization, B.V.S.C. and K.P.W.; supervision, K.P.W., M.J.B. and K.L.S.; project administration, B.V.S.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data will be made available on request.

Acknowledgments

We would like to express our sincere gratitude to the Centre for Environment and Societies, School of Applied Sciences, at the University of Brighton, United Kingdom, and School of Science and Technology, Nottingham Trent University, Nottingham for their invaluable support and resources in the preparation of this research. We are deeply appreciative of the opportunities provided by the centre, which have enriched our understanding of the subject matter.

Conflicts of Interest

Author Balendra V. S. Chauhan is the director of the company Tech GPT Ltd., NE6 2SR, Newcastle upon Tyne, UK. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. For Section 3.5 Wind Sector Analysis of VOCs and Ozone Concentrations

Table A1. Data for Toluene concentrations from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SW	2.14	3.85
WSW	1.99	3.85
SSW	1.91	3.8
NE	1.85	3.92
N	1.62	3.9
ENE	1.52	3.83
NNE	1.46	3.9
S	1.22	3.85
E	1.11	3.81
ESE	0.712	3.91
SSE	0.67	3.75
SE	0.655	3.65

Table A2. Data for Benzene concentration from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SSW	1.01	1.28
SW	0.85	1.26
WSW	0.841	1.26
NE	0.779	1.32
N	0.757	1.28
ENE	0.694	1.29
S	0.681	1.26
NNE	0.612	1.3
E	0.584	1.26
ESE	0.425	1.27
SSE	0.407	1.23
SE	0.382	1.2

Table A3. Data for EBenzene concentrations from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SW	0.462	0.81
WSW	0.437	0.816
SSW	0.428	0.799
NE	0.375	0.818
N	0.346	0.825
ENE	0.333	0.795
NNE	0.302	0.819
S	0.252	0.808
E	0.246	0.796
ESE	0.163	0.82
SE	0.152	0.763
SSE	0.146	0.785

Table A4. Data for Ethene concentrations from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SSW	2.29	3.15
WSW	2.14	3.19
SW	2.06	3.19
NE	1.86	3.23
ENE	1.86	3.15
N	1.64	3.21
E	1.55	3.01
NNE	1.49	3.25
S	1.26	3.05
ESE	0.851	3.06
SE	0.755	2.87
SSE	0.692	2.96

Table A5. Data for Ethyne concentrations from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
E	2.53	4.29
ENE	2.29	3.96
WSW	1.06	1.34
SSW	1.03	1.31
NE	0.97	2.75
SW	0.94	1.34
N	0.75	1.4
NNE	0.67	1.48
ESE	0.63	1.84
S	0.61	1.35
SSE	0.44	1.32
SE	0.42	1.29

Table A6. Data for Propane concentration from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SW	6.57	10.6
WSW	6.52	10.5
SSW	5.59	10.6
N	3.98	10.7
NE	3.86	10.8
E	3.84	10.6
ENE	3.83	10.6
S	3.77	10.8
NNE	3.14	10.6
ESE	2.88	10.8
SSE	2.64	10.6
SE	2.41	10.4

Table A7. Data for Propene concentration from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SSW	0.948	1.37
WSW	0.936	1.38
N	0.904	1.39
NE	0.875	1.39
SW	0.87	1.37
NNE	0.776	1.39
ENE	0.717	1.37
S	0.702	1.37
SSE	0.601	1.35
E	0.564	1.36
SE	0.523	1.32
ESE	0.467	1.38

Table A8. Data for O₃ concentration from polar plot.

Wind Sector	Mean Concentration (μg/m³)	Max Concentration (μg/m³)
SSE	55.3	72.8
SE	50.4	70.1
S	44.7	66.8
ESE	43.1	53.4
E	35.5	48.2
NNE	33.2	54.9
ENE	32.7	51.4
NE	26.5	46.9
SSW	26.3	37.4
N	26.1	58.3
SW	25.2	31.4
WSW	19.9	28.5

Appendix B. For Section 3.6 Principal Component and Clustering Analysis of VOCs and Meteorological Variables in Urban Air Quality

Table A9. PCA Result table.

Dimension	Eigenvalue	Variance Explained (%)	Cumulative Variance (%)
Dim 1	4.7752	39.79%	39.79%
Dim 2	1.78	14.83%	54.63%
Dim 3	1.23	10.25%	64.88%
Dim 4	1.0306	8.59%	73.46%
Dim 5	0.8367	6.97%	80.44%
Dim 6	0.7328	6.11%	86.54%
Dim 7	0.5579	4.65%	91.19%
Dim 8	0.4239	3.53%	94.73%
Dim 9	0.2476	2.06%	96.79%
Dim 10	0.2044	1.70%	98.49%
Dim 11	0.11	0.92%	99.41%
Dim 12	0.071	0.59%	100.00%

Table A10. PCA Summary.

PC	Variance explained	Cumulative
PC1	39.80%	39.80%
PC2	14.80%	54.60%
PC3	10.20%	64.90%
PC4	8.60%	73.50%
PC5	7.00%	80.40%
PC6	6.10%	86.50%

Table A11. Contributions of Variables to Top 4 PCs.

Variable	PC1 (%)	PC2 (%)	PC3 (%)	PC4 (%)
Benzene	15.68	7.33	0.37	1.91
EBenzene	14.54	11.22	0.43	0.01
Ethene	15.85	0.82	1.11	0.23
Ethyne	2.79	0.03	7.53	43.8
Isoprene	2.21	36.57	4.2	5.43
Propane	8.87	8.01	3.04	0.03
Propene	15.72	0.19	0.08	6.12
Toluene	13.97	0.36	1.27	12.72
O₃	7.64	9.81	2.27	2.05
Air Temp	1.03	19.73	1.95	27.34
Wind Speed (ws)	1.52	5.92	28.71	0.18
Wind Direction (wd)	0.17	0.02	49.04	0.18

References

Vereecken, L.; Aumont, B.; Barnes, I.; Bozzelli, J.W.; Goldman, M.J.; Green, W.H.; Madronich, S.; Mcgillen, M.R.; Mellouki, A.; Orlando, J.J.; et al. Perspective on mechanism development and structure-activity relationships for gas-phase atmospheric chemistry. Int. J. Chem. Kinet. 2018, 50, 435–469. [Google Scholar] [CrossRef]
González-Martín, J.; Kraakman, N.J.R.; Pérez, C.; Lebrero, R.; Muñoz, R. A state–of–the-art review on indoor air pollution and strategies for indoor air pollution control. Chemosphere 2021, 262, 128376. [Google Scholar] [CrossRef]
Sarigiannis, D.A.; Karakitsios, S.P.; Gotti, A.; Liakos, I.L.; Katsoyiannis, A. Exposure to major volatile organic compounds and carbonyls in European indoor environments and associated health risk. Environ. Int. 2011, 37, 743–765. [Google Scholar] [CrossRef] [PubMed]
Chowdhury, S.; Champagne, P. Risk from exposure to trihalomethanes during shower: Probabilistic assessment and control. Sci. Total Environ. 2009, 407, 1570–1578. [Google Scholar] [CrossRef] [PubMed]
Zhou, X.; Zhou, X.; Wang, C.; Zhou, H. Environmental and human health impacts of volatile organic compounds: A perspective review. Chemosphere 2023, 313, 137489. [Google Scholar] [CrossRef] [PubMed]
Laothawornkitkul, J.; Taylor, J.E.; Paul, N.D.; Hewitt, C.N. Biogenic volatile organic compounds in the Earth system. New Phytol. 2009, 183, 27–51. [Google Scholar] [CrossRef]
Michanowicz, D.R.; Dayalu, A.; Nordgaard, C.L.; Buonocore, J.J.; Fairchild, M.W.; Ackley, R.; Schiff, J.E.; Liu, A.; Phillips, N.G.; Schulman, A. Home is where the pipeline ends: Characterization of volatile organic compounds present in natural gas at the point of the residential end user. Environ. Sci. Technol. 2022, 56, 10258–10268. [Google Scholar] [CrossRef]
Crippa, M.; Guizzardi, D.; Pisoni, E.; Solazzo, E.; Guion, A.; Muntean, M.; Florczyk, A.; Schiavina, M.; Melchiorri, M.; Hutfilter, A.F. Global anthropogenic emissions in urban areas: Patterns, trends, and challenges. Environ. Res. Lett. 2021, 16, 074033. [Google Scholar] [CrossRef]
Liu, H.; Man, H.; Cui, H.; Wang, Y.; Deng, F.; Wang, Y.; Yang, X.; Xiao, Q.; Zhang, Q.; Ding, Y. An updated emission inventory of vehicular VOCs and IVOCs in China. Atmos. Chem. Phys. 2017, 17, 12709–12724. [Google Scholar] [CrossRef]
Wang, M.; Li, S.; Zhu, R.; Zhang, R.; Zu, L.; Wang, Y.; Bao, X. On-road tailpipe emission characteristics and ozone formation potentials of VOCs from gasoline, diesel and liquefied petroleum gas fueled vehicles. Atmos. Environ. 2020, 223, 117294. [Google Scholar] [CrossRef]
Zhou, L.; Jiao, X.; Yang, B.; Yuan, W.; Zhao, W.; Zhang, L.; Huang, W.; Long, S.; Xu, J.; Shen, H. The Impact of Indoor Environments on the Abundance of Urban Outdoor VOCs. Environ. Sci. Technol. 2025, 59, 9654–9664. [Google Scholar] [CrossRef] [PubMed]
Alford, K.L.; Kumar, N. Pulmonary health effects of indoor volatile organic compounds—A meta-analysis. Int. J. Environ. Res. Public Health 2021, 18, 1578. [Google Scholar] [CrossRef] [PubMed]
Woolley, T. Building Materials, Health and Indoor Air Quality: Volume 2; Routledge: Oxfordshire, UK, 2024. [Google Scholar]
Gao, M.; Liu, W.; Wang, H.; Shao, X.; Shi, A.; An, X.; Li, G.; Nie, L. Emission factors and characteristics of volatile organic compounds (VOCs) from adhesive application in indoor decoration in China. Sci. Total Environ. 2021, 779, 145169. [Google Scholar] [CrossRef] [PubMed]
Halios, C.H.; Landeg-Cox, C.; Lowther, S.D.; Middleton, A.; Marczylo, T.; Dimitroulopoulou, S. Chemicals in European residences–Part I: A review of emissions, concentrations and health effects of volatile organic compounds (VOCs). Sci. Total Environ. 2022, 839, 156201. [Google Scholar] [CrossRef]
Noorian Najafabadi, S.A.; Sugano, S.; Bluyssen, P.M. Impact of carpets on indoor air quality. Appl. Sci. 2022, 12, 12989. [Google Scholar] [CrossRef]
Calderon, L.; Maddalena, R.; Russell, M.; Chen, S.; Nolan, J.E.; Bradman, A.; Harley, K.G. Air concentrations of volatile organic compounds associated with conventional and “green” cleaning products in real-world and laboratory settings. Indoor Air 2022, 32, e13162. [Google Scholar] [CrossRef]
Palmisani, J.; Di Gilio, A.; Cisternino, E.; Tutino, M.; de Gennaro, G. Volatile Organic Compound (VOC) emissions from a personal care polymer-based item: Simulation of the inhalation exposure scenario indoors under actual conditions of use. Sustainability 2020, 12, 2577. [Google Scholar] [CrossRef]
Barrese, E.; Gioffrè, A.; Scarpelli, M.; Turbante, D.; Trovato, R.; Iavicoli, S. Indoor pollution in work office: VOCs, formaldehyde and ozone by printer. Occup. Dis. Environ. Med. 2014, 2, 49–55. [Google Scholar] [CrossRef][Green Version]
Pajaro-Castro, N.; Caballero-Gallardo, K.; Olivero-Verbel, J. Identification of volatile organic compounds (VOCs) in plastic products using gas chromatography and mass spectrometry (GC/MS). Rev. Ambiente Água 2014, 9, 610–620. [Google Scholar]
Tabatabaei, Z.; Baghapour, M.A.; Hoseini, M.; Fararouei, M.; Abbasi, F.; Baghapour, M. Assessing BTEX concentrations emitted by hookah smoke in indoor air of residential buildings: Health risk assessment for children. J. Environ. Health Sci. Eng. 2021, 19, 1653–1665. [Google Scholar] [CrossRef]
Smog, P.; Sillman, S. Tropospheric Ozone and. Environ. Geochem. 2005, 9, 407. [Google Scholar]
Nelson, B.S.; Stewart, G.J.; Drysdale, W.S.; Newland, M.J.; Vaughan, A.R.; Dunmore, R.E.; Edwards, P.M.; Lewis, A.C.; Hamilton, J.F.; Acton, W.J. In situ ozone production is highly sensitive to volatile organic compounds in Delhi, India. Atmos. Chem. Phys. 2021, 21, 13609–13630. [Google Scholar] [CrossRef]
Filella, I.; Penuelas, J. Daily, weekly, and seasonal time courses of VOC concentrations in a semi-urban area near Barcelona. Atmos. Environ. 2006, 40, 7752–7769. [Google Scholar] [CrossRef]
Borbon, A.; Gilman, J.; Kuster, W.; Grand, N.; Chevaillier, S.; Colomb, A.; Dolgorouky, C.; Gros, V.; Lopez, M.; Sarda-Esteve, R. Emission ratios of anthropogenic volatile organic compounds in northern mid-latitude megacities: Observations versus emission inventories in Los Angeles and Paris. J. Geophys. Res. Atmos. 2013, 118, 2041–2057. [Google Scholar] [CrossRef]
Yu, C.H.; Zhu, X.; Fan, Z.-h. Spatial/temporal variations and source apportionment of VOCs monitored at community scale in an urban area. PLoS ONE 2014, 9, e95734. [Google Scholar] [CrossRef]
Küfeoğlu, S. Industrial Process Emissions. In Net Zero: Decarbonizing the Global Economies; Springer: Berlin/Heidelberg, Germany, 2024; pp. 341–414. [Google Scholar]
Lu, X.; Zhang, L.; Shen, L. Meteorology and climate influences on tropospheric ozone: A review of natural sources, chemistry, and transport patterns. Curr. Pollut. Rep. 2019, 5, 238–260. [Google Scholar] [CrossRef]
Jain, A.; Babu, V.; Saxena, M.; Aigal, A.; Singal, S.K.; Koganti, R.; Nandi, S. Effect of Gasoline Composition (Olefins, Aromatics and Benzene) on Automotive Exhaust Emissions–A Literature Review; SAE Technical Paper: Warrendale, PA, USA, 2004. [Google Scholar]
Ng, K.; Cheng, Z. Environmental monitoring of benzene and alkylated benzene from vehicular emissions. Environ. Monit. Assess. 1997, 44, 437–441. [Google Scholar] [CrossRef]
Rösch, C.; Kohajda, T.; Röder, S.; von Bergen, M.; Schlink, U. Relationship between sources and patterns of VOCs in indoor air. Atmos. Pollut. Res. 2014, 5, 129–137. [Google Scholar] [CrossRef]
Sun, X.; Wang, H.; Guo, Z.; Lu, P.; Song, F.; Liu, L.; Liu, J.; Rose, N.L.; Wang, F. Positive matrix factorization on source apportionment for typical pollutants in different environmental media: A review. Environ. Sci. Process. Impacts 2020, 22, 239–255. [Google Scholar] [CrossRef]
Frischmon, C.; Hannigan, M. VOC source apportionment: How monitoring characteristics influence positive matrix factorization (PMF) solutions. Atmos. Environ. X 2024, 21, 100230. [Google Scholar] [CrossRef]
Yuan, B.; Shao, M.; De Gouw, J.; Parrish, D.D.; Lu, S.; Wang, M.; Zeng, L.; Zhang, Q.; Song, Y.; Zhang, J. Volatile organic compounds (VOCs) in urban air: How chemistry affects the interpretation of positive matrix factorization (PMF) analysis. J. Geophys. Res. Atmos. 2012, VOL. 117, D24302. [Google Scholar] [CrossRef]
Valach, A.; Langford, B.; Nemitz, E.; MacKenzie, A.; Hewitt, C. Seasonal and diurnal trends in concentrations and fluxes of volatile organic compounds in central London. Atmos. Chem. Phys. 2015, 15, 7777–7796. [Google Scholar] [CrossRef]
Khorshidi, N.; Parsa, M.; Lentz, D.R.; Sobhanverdi, J. Identification of heavy metal pollution sources and its associated risk assessment in an industrial town using the K-means clustering technique. Appl. Geochem. 2021, 135, 105113. [Google Scholar] [CrossRef]
Licen, S.; Astel, A.; Tsakovski, S. Self-organizing map algorithm for assessing spatial and temporal patterns of pollutants in environmental compartments: A review. Sci. Total Environ. 2023, 878, 163084. [Google Scholar] [CrossRef] [PubMed]
Jenkin, M.E.; Clemitshaw, K.C. Ozone and other secondary photochemical pollutants: Chemical processes governing their formation in the planetary boundary layer. Atmos. Environ. 2000, 34, 2499–2527. [Google Scholar] [CrossRef]
Wang, P.; Chen, Y.; Hu, J.; Zhang, H.; Ying, Q. Attribution of tropospheric ozone to NO x and VOC emissions: Considering ozone formation in the transition regime. Environ. Sci. Technol. 2018, 53, 1404–1412. [Google Scholar] [CrossRef]
Liu, Y.; Chen, T.; Ma, Z.; Li, Q.; Gao, Y.; Xue, L.; Wang, W. Variation of biogenic VOC contribution to ozone formation with reduced anthropogenic precursor emissions: Coupling online observation and future scenario simulation. Sci. Total Environ. 2025, 961, 178380. [Google Scholar] [CrossRef]
Soni, V.; Singh, P.; Shree, V.; Goel, V. Effects of VOCs on human health. In Air Pollution and Control; Springer: Singapore, 2018; pp. 119–142. [Google Scholar] [CrossRef]
Wang, L.; Du, J.; Wu, X.; Gan, Z. Assessing the impact of volatile organic compounds on cardiovascular health: Insights from the National Health and nutrition examination survey 2011–2020. Ecotoxicol. Environ. Saf. 2025, 293, 118050. [Google Scholar] [CrossRef]
Landeg-Cox, C.; Middleton, A.; Halios, C.H.; Marczylo, T.; Dimitroulopoulou, S. Chemicals in European residences—Part II: A review of emissions, concentrations, and health effects of Semi-Volatile Organic Compounds (SVOCs). Environments 2025, 12, 40. [Google Scholar] [CrossRef]
Carocho, M.; Barreiro, M.F.; Morales, P.; Ferreira, I.C. Adding molecules to food, pros and cons: A review on synthetic and natural food additives. Compr. Rev. Food Sci. Food Saf. 2014, 13, 377–399. [Google Scholar] [CrossRef]
Vardoulakis, S.; Dimitroulopoulou, C.; Thornes, J.; Lai, K.-M.; Taylor, J.; Myers, I.; Heaviside, C.; Mavrogianni, A.; Shrubsole, C.; Chalabi, Z. Impact of climate change on the domestic indoor environment and associated health risks in the UK. Environ. Int. 2015, 85, 299–313. [Google Scholar] [CrossRef] [PubMed]
Tsai, W.-T. An overview of health hazards of volatile organic compounds regulated as indoor air pollutants. Rev. Environ. Health 2019, 34, 81–89. [Google Scholar] [CrossRef] [PubMed]
Chauhan, B.V.; Smallbone, K.L.; Berg, M.; Wyche, K.P. The temporal evolution of HCHO and changes in atmospheric composition in the southeast of the United Kingdom. Case Stud. Chem. Environ. Eng. 2025, 11, 101092. [Google Scholar] [CrossRef]
McCarthy, M.C.; Hafner, H.R.; Montzka, S.A. Background concentrations of 18 air toxics for North America. J. Air Waste Manag. Assoc. 2006, 56, 3–11. [Google Scholar] [CrossRef]
Chauhan, B.V.; Corada, K.; Young, C.; Smallbone, K.L.; Wyche, K.P. Review on Sampling Methods and Health Impacts of Fine (PM_2.5, ≤2.5 µm) and Ultrafine (UFP, PM_0.1, ≤0.1 µm) Particles. Atmosphere 2024, 15, 572. [Google Scholar] [CrossRef]
Puttaswamy, N.; Natarajan, S.; Saidam, S.R.; Mukhopadhyay, K.; Sadasivam, S.; Sambandam, S.; Balakrishnan, K. Evaluation of health risks associated with exposure to volatile organic compounds from household fuel combustion in southern India. Environ. Adv. 2021, 4, 100043. [Google Scholar] [CrossRef]
Fang, L.; Norris, C.; Johnson, K.; Cui, X.; Sun, J.; Teng, Y.; Tian, E.; Xu, W.; Li, Z.; Mo, J. Toxic volatile organic compounds in 20 homes in Shanghai: Concentrations, inhalation health risks, and the impacts of household air cleaning. Build. Environ. 2019, 157, 309–318. [Google Scholar] [CrossRef]
Lei, R.; Sun, Y.; Zhu, S.; Jia, T.; He, Y.; Deng, J.; Liu, W. Investigation on distribution and risk assessment of volatile organic compounds in surface water, sediment, and soil in a chemical industrial park and adjacent area. Molecules 2021, 26, 5988. [Google Scholar] [CrossRef]
Moran, M.J.; Hamilton, P.A.; Zogorski, J.S. Volatile Organic Compounds in the Nation’s Ground Water and Drinking-Water Supply Wells. In Proceedings of the WEFTEC 2007, San Diego, CA, USA, 13–17 October 2007; pp. 2650–2658. [Google Scholar]
Pandey, P.; Yadav, R. A review on volatile organic compounds (VOCs) as environmental pollutants: Fate and distribution. Int. J. Plant Environ. 2018, 4, 14–26. [Google Scholar] [CrossRef]
Song, Y.; Shao, M.; Liu, Y.; Lu, S.; Kuster, W.; Goldan, P.; Xie, S. Source apportionment of ambient volatile organic compounds in Beijing. Environ. Sci. Technol. 2007, 41, 4348–4353. [Google Scholar] [CrossRef]
Liu, Y.; Shao, M.; Zhang, J.; Fu, L.; Lu, S. Distributions and source apportionment of ambient volatile organic compounds in Beijing city, China. J. Environ. Sci. Health 2005, 40, 1843–1860. [Google Scholar] [CrossRef] [PubMed]
Monod, A.; Sive, B.C.; Avino, P.; Chen, T.; Blake, D.R.; Rowland, F.S. Monoaromatic compounds in ambient air of various cities: A focus on correlations between the xylenes and ethylbenzene. Atmos. Environ. 2001, 35, 135–149. [Google Scholar] [CrossRef]
Kheirbek, I.; Johnson, S.; Ross, Z.; Pezeshki, G.; Ito, K.; Eisl, H.; Matte, T. Spatial variability in levels of benzene, formaldehyde, and total benzene, toluene, ethylbenzene and xylenes in New York City: A land-use regression study. Environ. Health 2012, 11, 1–12. [Google Scholar] [CrossRef] [PubMed]
Na, K.; Moon, K.-C.; Kim, Y.P. Source contribution to aromatic VOC concentration and ozone formation potential in the atmosphere of Seoul. Atmos. Environ. 2005, 39, 5517–5524. [Google Scholar] [CrossRef]
Mandal, T.; Yadav, P.; Kumar, M.; Lal, S.; Soni, K.; Yadav, L.; Saharan, U.S.; Sharma, S. Characteristics of volatile organic compounds (VOCs) at an urban site of Delhi, India: Diurnal and seasonal variation, sources apportionment. Urban Clim. 2023, 49, 101545. [Google Scholar] [CrossRef]
Ait-Helal, W.; Borbon, A.; Sauvage, S.; De Gouw, J.; Colomb, A.; Gros, V.; Freutel, F.; Crippa, M.; Afif, C.; Baltensperger, U. Volatile and intermediate volatility organic compounds in suburban Paris: Variability, origin and importance for SOA formation. Atmos. Chem. Phys. 2014, 14, 10439–10464. [Google Scholar] [CrossRef]
Rhew, R.C.; Deventer, M.J.; Turnipseed, A.A.; Warneke, C.; Ortega, J.; Shen, S.; Martinez, L.; Koss, A.; Lerner, B.M.; Gilman, J.B. Ethene, propene, butene and isoprene emissions from a ponderosa pine forest measured by relaxed eddy accumulation. Atmos. Chem. Phys. 2017, 17, 13417–13438. [Google Scholar] [CrossRef]
Sharma, R.C.; Sharma, N. Assessment of variations and correlation of ozone and its precursors, benzene, nitrogen dioxide, carbon monoxide and some Meteorological Variables at two sites of significant spatial variations in Delhi, Northern India. Pollution 2021, 7, 723–737. [Google Scholar]
Barket Jr, D.J.; Grossenbacher, J.W.; Hurst, J.M.; Shepson, P.B.; Olszyna, K.; Thornberry, T.; Carroll, M.A.; Roberts, J.; Stroud, C.; Bottenheim, J. A study of the NO_x dependence of isoprene oxidation. J. Geophys. Res. Atmos. 2004, 109. [Google Scholar] [CrossRef]
Copeland, G.; Ghosh, M.V.; Shallcross, D.E.; Percival, C.J.; Dyke, J.M. A study of the alkene–ozone reactions, 2, 3-dimethyl 2-butene+ O₃ and 2-methyl propene+ O₃, with photoelectron spectroscopy: Measurement of product branching ratios and atmospheric implications. Phys. Chem. Chem. Phys. 2011, 13, 17461–17473. [Google Scholar] [CrossRef]
Copeland, G.; Ghosh, M.V.; Shallcross, D.E.; Percival, C.J.; Dyke, J.M. A study of the ethene-ozone reaction with photoelectron spectroscopy: Measurement of product branching ratios and atmospheric implications. Phys. Chem. Chem. Phys. 2011, 13, 14839–14847. [Google Scholar] [CrossRef] [PubMed]
Parker, J.K.; Davis, S.R. Photochemical reactions of oxygen atoms with toluene, m-xylene, p-xylene, and mesitylene: An infrared matrix isolation investigation. J. Phys. Chem. A 2000, 104, 4108–4114. [Google Scholar] [CrossRef]
Coggon, M.M.; Gkatzelis, G.I.; McDonald, B.C.; Gilman, J.B.; Schwantes, R.H.; Abuhassan, N.; Aikin, K.C.; Arend, M.F.; Berkoff, T.A.; Brown, S.S. Volatile chemical product emissions enhance ozone and modulate urban chemistry. Proc. Natl. Acad. Sci. USA 2021, 118, e2026653118. [Google Scholar] [CrossRef] [PubMed]
Baudic, A.; Gros, V.; Sauvage, S.; Locoge, N.; Sanchez, O.; Sarda-Estève, R.; Kalogridis, C.; Petit, J.-E.; Bonnaire, N.; Baisnée, D. Seasonal variability and source apportionment of volatile organic compounds (VOCs) in the Paris megacity (France). Atmos. Chem. Phys. 2016, 16, 11961–11989. [Google Scholar] [CrossRef]
Li, H.; Andrews, G.E.; Savvidis, D. Impact of Ambient Temperatures on VOC Emissions and OFP During Cold Start for SI Car Real World Urban Driving; 0148-7191; SAE Technical Paper: Warrendale, PA, USA, 2009. [Google Scholar] [CrossRef]
Herbin, H.; Hurtmans, D.; Clarisse, L.; Turquety, S.; Clerbaux, C.; Rinsland, C.P.; Boone, C.; Bernath, P.; Coheur, P.F. Distributions and seasonal variations of tropospheric ethene (C₂H₄) from Atmospheric Chemistry Experiment (ACE-FTS) solar occultation spectra. Geophys. Res. Lett. 2009, 36. [Google Scholar] [CrossRef]
Clapp, L.J.; Jenkin, M.E. Analysis of the relationship between ambient levels of O₃, NO₂ and NO as a function of NO_x in the UK. Atmos. Environ. 2001, 35, 6391–6405. [Google Scholar] [CrossRef]
Monks, P.S.; Granier, C.; Fuzzi, S.; Stohl, A.; Williams, M.L.; Akimoto, H.; Amann, M.; Baklanov, A.; Baltensperger, U.; Bey, I. Atmospheric composition change–global and regional air quality. Atmos. Environ. 2009, 43, 5268–5350. [Google Scholar]
Von Schneidemesser, E.; Monks, P.S.; Plass-Duelmer, C. Global comparison of VOC and CO observations in urban areas. Atmos. Environ. 2010, 44, 5053–5064. [Google Scholar] [CrossRef]
Acton, W.J.F.; Huang, Z.; Davison, B.; Drysdale, W.S.; Fu, P.; Hollaway, M.; Langford, B.; Lee, J.; Liu, Y.; Metzger, S. Surface–atmosphere fluxes of volatile organic compounds in Beijing. Atmos. Chem. Phys. 2020, 20, 15101–15125. [Google Scholar] [CrossRef]
Bryant, D.J.; Dixon, W.J.; Hopkins, J.R.; Dunmore, R.E.; Pereira, K.L.; Shaw, M.; Squires, F.A.; Bannan, T.J.; Mehra, A.; Worrall, S.D. Strong anthropogenic control of secondary organic aerosol formation from isoprene in Beijing. Atmos. Chem. Phys. 2020, 20, 7531–7552. [Google Scholar] [CrossRef]
Seinfeld, J.H.; Pandis, S.N. Atmospheric chemistry and physics: From air pollution to climate change; John Wiley & Sons: Hoboken, NJ, USA, 2016; p. 179. [Google Scholar]
Song, M.; Li, X.; Yang, S.; Yu, X.; Zhou, S.; Yang, Y.; Chen, S.; Dong, H.; Liao, K.; Chen, Q. Spatiotemporal variation, sources, and secondary transformation potential of VOCs in Xi’an, China. Atmos. Chem. Phys. Discuss. 2020, 2020, 1–34. [Google Scholar]
Atkinson, R. Atmospheric chemistry of VOCs and NOx. Atmos. Environ. 2000, 34, 2063–2101. [Google Scholar] [CrossRef]

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Temporal and Machine Learning-Based Principal Component and Clustering Analysis of VOCs and Their Role in Urban Air Pollution and Ozone Formation

Abstract

1. Introduction