Advancing Spatiotemporal Pollutant Dispersion Forecasting with an Integrated Deep Learning Framework for Crucial Information Capture

: This study addressed the limitations of traditional methods in predicting air pollution dispersion, which include restrictions in handling spatiotemporal dynamics, unbalanced feature importance, and data scarcity. To overcome these challenges, this research introduces a novel deep learning-based model, SAResNet-TCN, which integrates the strengths of a Residual Neural Network (ResNet) and a Temporal Convolutional Network (TCN). This fusion is designed to effectively capture the spatiotemporal characteristics and temporal correlations within pollutant dispersion data. The incorporation of a sparse attention (SA) mechanism further refines the model’s focus on critical information, thereby improving efficiency. Furthermore, this study employed a Time-Series Generative Adversarial Network (TimeGAN) to augment the dataset, thereby improving the generalisability of the model. In rigorous ablation and comparison experiments, the SAResNet-TCN model demonstrated significant advances in predicting pollutant dispersion patterns, including accurate predictions of concentration peaks and trends. These results were enhanced by a global sensitivity analysis (GSA) and an additive-by-addition approach, which identified the optimal combination of input variables for different scenarios by examining their impact on the model’s performance. This study also included visual representations of the maximum downwind hazardous distance (MDH-distance) for pollutants, validated against the Prairie Grass Project Release 31, with the Protective Action Criteria (PAC) and Immediately Dangerous to Life or Health (IDLH) levels serving as hazard thresholds. This comprehensive approach to contaminant dispersion prediction aims to provide an innovative and practical solution for environmental hazard prediction and management.


Introduction
The rapid development of industrialisation and urbanisation has led to an increased focus on the dispersion of atmospheric pollutants as a key research topic in the field of environmental sustainability [1,2].The dispersion of pollutants exhibits dynamic complexity in time and space and is influenced by a variety of interrelated factors.For instance, the formation of a night-time inversion layer or the occurrence of storms can lead to rapid and non-stationary fluctuations in pollutant dispersion.The intricacy of these challenges renders it challenging to adapt conventional dispersion prediction techniques to this intricate scenario, compelling researchers to pursue novel methodologies and strategies to surmount the constraints of the conventional paradigm [3,4].
Deep learning models help to better understand pollutant dispersion patterns by analysing and extracting temporal relationships and spatial trends, providing valuable predictive and decision support [5,6].Previous studies have used Convolution Neural Networks (CNNs) to extract spatial information from pollutant dispersion data and Recurrent Neural Networks (RNNs) to discover temporal correlations between data [7][8][9].However, these models still have some limitations.For example, CNNs may lose edge and detail information when processing high-dimensional data, which is unacceptable for representing fine-scale variations in pollutant dispersion [10][11][12].Conversely, RNNs are prone to the problem of gradient vanishing or gradient explosion when dealing with long sequential data, which affects the ability of the models to capture complex temporal dependencies [13][14][15].Therefore, there is a need to develop new modelling approaches to overcome these limitations and improve the accuracy and reliability of predictions.
In this study, we proposed a deep learning architecture incorporating a Residual Neural Network (ResNet) and a Temporal Convolutional Network (TCN), called SAResNet-TCN, which aims to adequately capture the spatial and temporal features of pollutant dispersion.ResNet solves the degradation problem in training deep CNNs by introducing residual learning, which allows the network to learn a deeper representation of the features [16,17].With residual connectivity, the model effectively avoids information loss when training deep models and mitigates the problem of gradient vanishing [18,19].In contrast, a TCN effectively captures long-term dependencies in sequence data while maintaining temporal consistency by introducing a causal convolution and diffusion layer structure [9,14].When ResNet is combined with a TCN, a more comprehensive feature representation and modelling capability can be created, enhancing the model's ability to represent complex patterns and improving prediction accuracy.In addition, a sparse attention (SA) mechanism was specifically incorporated in this study to take advantage of the sparsity that it introduces to strengthen the model's focus on key features in time-series data [20].This strategy greatly improves the sensitivity of the model to key temporal nodes and features in the prediction of pollutant dispersion peaks, providing strong support for accurate prediction.These advantages enable SAResNet-TCN to predict pollutant dispersion patterns, especially the unconventional dispersion behaviours caused by unexpected events.
The main contributions of this study can be summarised as follows: 1.An innovative deep learning model named SAResNet-TCN is introduced.This model is uniquely designed with an attention branch that operates in parallel with ResNet to extract and integrate key features, thereby enhancing the model's ability to identify and capture significant features.In addition, the integration of the TCN module ensures that the model effectively captures the temporal dependencies between features.Validated through case studies, the proposed model demonstrated improved accuracy in predicting pollutant dispersion, providing a novel approach to support environmental sustainability.
2. A global sensitivity analysis (GSA) was used to improve the interpretability and practical value of deep learning models in environmental management.Using the Sobol method, the impact of input parameters on model outputs was quantitatively analysed, identifying factors such as wind speed, atmospheric stability level, and pollution source strength as dominant influences on pollutant dispersion.Through the incremental addition of parameters into the experiments, this study further identified the key parameters that should be prioritised in resource-constrained scenarios and provides recommendations for effective resource allocation.
3. By applying standards such as the Protective Action Criteria (PAC) and Immediately Dangerous to Life or Health (IDLH) values, the maximum distances to hazard (MDH-distances) were visualised at different sulphur dioxide (SO 2 ) concentration hazard thresholds according to the six weather conditions defined in GBT37243-2019 [21].The results not only help to assess the serious consequences of uncontrolled toxic releases but also provide an important basis for worker protection in industrial environments.
This paper is organised as follows: Section 2 outlines the specific details of the proposed pollutant dispersion prediction model.Section 3 presents the experimental results and describes the results for different hazard thresholds.Finally, the main conclusions of this study are summarised in Section 4.

ResNet
When tackling the task of pollutant dispersion modelling, ResNet uses convolution kernels to capture the distribution of pollutant concentrations in a local area and identify local concentration variations and spatial relationships.This local perceptivity allows ResNet to effectively capture the local characteristics of pollutant dispersion, thereby improving the accuracy of the model.Compared to a CNN, ResNet incorporates residual connections.This structure boosts the network's propagation efficiency by combining residual units with directly connected edges.Thus, the model's loss of original information is significantly reduced [22].ResNet is composed of basic residual blocks, as shown in Figure 1.Each basic residual block comprises a mapping section and a residual section, with the core formula shown in Equation (1).
In the formula, x l+1 and x l are the feature inputs of the l + 1th and lth layers of the model, respectively; W l refers to the weight parameters of the residual cells.
and describes the results for different hazard thresholds.Finally, the main conclusions of this study are summarised in Section 4.

ResNet
When tackling the task of pollutant dispersion modelling, ResNet uses convolution kernels to capture the distribution of pollutant concentrations in a local area and identify local concentration variations and spatial relationships.This local perceptivity allows Res-Net to effectively capture the local characteristics of pollutant dispersion, thereby improving the accuracy of the model.Compared to a CNN, ResNet incorporates residual connections.This structure boosts the network's propagation efficiency by combining residual units with directly connected edges.Thus, the model's loss of original information is significantly reduced [22].ResNet is composed of basic residual blocks, as shown in Figure 1.Each basic residual block comprises a mapping section and a residual section, with the core formula shown in Equation (1).
In the formula,  +1 and   are the feature inputs of the l + 1th and lth layers of the model, respectively;   refers to the weight parameters of the residual cells.

TCN
A TCN possesses superior time-series data processing capabilities and has demonstrated better performance than LSTM and GRU [23,24].The fundamental elements of a TCN comprise dilated causal convolution and residual concatenation.Dilation causal convolution introduces both causality and dilation.By incorporating dilation into the convolutional kernel, dilation causal convolution can widen the receptive field of the kernel without sacrificing causality, thereby capturing a greater amount of contextual information.This technique is highly valuable for handling time-series data, permitting the TCN to model long-term dependencies effectively while maintaining causality.Residual connectivity serves the purpose of mitigating the issue of gradient vanishing while augmenting the flow of information through the network.Furthermore, it encourages the stability of the training process, allowing the network to acquire complex feature representations at a deeper level [25].Please refer to Figure 2 for a diagram of the basic modules of a TCN.

TCN
A TCN possesses superior time-series data processing capabilities and has demonstrated better performance than LSTM and GRU [23,24].The fundamental elements of a TCN comprise dilated causal convolution and residual concatenation.Dilation causal convolution introduces both causality and dilation.By incorporating dilation into the convolutional kernel, dilation causal convolution can widen the receptive field of the kernel without sacrificing causality, thereby capturing a greater amount of contextual information.This technique is highly valuable for handling time-series data, permitting the TCN to model long-term dependencies effectively while maintaining causality.Residual connectivity serves the purpose of mitigating the issue of gradient vanishing while augmenting the flow of information through the network.Furthermore, it encourages the stability of the training process, allowing the network to acquire complex feature representations at a deeper level [25].Please refer to Figure 2 for a diagram of the basic modules of a TCN.

SA Mechanism
The roots of attentional mechanisms can be traced back to the examination of human cognitive processes and explorations in neuroscience.Early attention mechanisms played an important role in deep learning by allowing models to selectively focus on specific information while processing input [26,27].Nonetheless, traditional attention mechanisms generally have a global scope that assigns attention weights to all elements in the input.As a result, the computational and storage requirements are substantial [28].The recently proposed SA mechanism, however, improves the traditional attention mechanism by introducing sparsity, which significantly reduces the time complexity [29].The calculation formula is displayed in Equation (2).The SA mechanism allows the model to prioritise task-relevant key information and disregard irrelevant elements by selectively assigning attention weights.This precise allocation of attention improves the model's efficiency and scalability while mitigating the risk of overfitting [29,30].
where K and V represent the key matrix and the value matrix;  denotes the sparse matrix; d represents the dimension of ; and T stands for the transpose operation of the matrix.

TimeGAN
TimeGAN is a Generative Adversarial Network (GAN) variant that combines the flexibility of unsupervised learning with the precise control of supervised learning, allowing for finer-grained dynamic tuning of the network [31].In the standard GAN framework, the core component is an adversarial module consisting of two networks: a generator and a discriminator [32].Through adversarial training between the generator and the discriminator, the GAN can continuously improve the data generated by the generator, making it increasingly more realistic, while at the same time improving the ability of the discriminator to discriminate between real and generated data.
TimeGAN not only contains the adversarial module of the traditional GAN but also adds a self-coding module [15].The main function of this self-coding module is to perform dimensionality reduction on the data.It consists of two parts, namely, an embedding function and a recovery function, which are connected by latent codes.The embedding function uses the hidden function to convert the data into a low-dimensional representation, and then the data are sent to the discriminator for screening.After screening by the

SA Mechanism
The roots of attentional mechanisms can be traced back to the examination of human cognitive processes and explorations in neuroscience.Early attention mechanisms played an important role in deep learning by allowing models to selectively focus on specific information while processing input [26,27].Nonetheless, traditional attention mechanisms generally have a global scope that assigns attention weights to all elements in the input.As a result, the computational and storage requirements are substantial [28].The recently proposed SA mechanism, however, improves the traditional attention mechanism by introducing sparsity, which significantly reduces the time complexity [29].The calculation formula is displayed in Equation (2).The SA mechanism allows the model to prioritise task-relevant key information and disregard irrelevant elements by selectively assigning attention weights.This precise allocation of attention improves the model's efficiency and scalability while mitigating the risk of overfitting [29,30].
where K and V represent the key matrix and the value matrix; Q denotes the sparse matrix; d represents the dimension of Q; and T stands for the transpose operation of the matrix.

TimeGAN
TimeGAN is a Generative Adversarial Network (GAN) variant that combines the flexibility of unsupervised learning with the precise control of supervised learning, allowing for finer-grained dynamic tuning of the network [31].In the standard GAN framework, the core component is an adversarial module consisting of two networks: a generator and a discriminator [32].Through adversarial training between the generator and the discriminator, the GAN can continuously improve the data generated by the generator, making it increasingly more realistic, while at the same time improving the ability of the discriminator to discriminate between real and generated data.
TimeGAN not only contains the adversarial module of the traditional GAN but also adds a self-coding module [15].The main function of this self-coding module is to perform dimensionality reduction on the data.It consists of two parts, namely, an embedding function and a recovery function, which are connected by latent codes.The embedding function uses the hidden function to convert the data into a low-dimensional representation, and then the data are sent to the discriminator for screening.After screening by the discriminator, the data are inversely transformed by the recovery function to produce an enhanced dataset.To introduce time-series relationships between the data in the GAN network, TimeGAN uses a supervised loss function based on an autoregressive learning algorithm.This allows the network to learn and model time-dependent probabilities, which, in turn, generates data with time-series properties [33].Figure 3 shows how data are processed and generated by the different modules of the network in TimeGAN.
Sustainability 2024, 16, x FOR PEER REVIEW discriminator, the data are inversely transformed by the recovery function to pro enhanced dataset.To introduce time-series relationships between the data in th network, TimeGAN uses a supervised loss function based on an autoregressive algorithm.This allows the network to learn and model time-dependent proba which, in turn, generates data with time-series properties [33].Figure 3 shows h are processed and generated by the different modules of the network in TimeGA

The proposed Model: SAResNet-TCN
Figure 4 depicts the SAResNet-TCN framework, followed by a sequential e tion of its fundamental stages.
(1) Data Preprocessing In the construction of deep learning models, the diversity of the dataset, wh include a variety of data types, can cause higher numerical features to have a mor icant influence on the model, while the influence of features with lower numerica is reduced.To address this issue, this study used the min-max normalisation me the data preprocessing stage.This method effectively scales or transforms the d reduces the scale differences between features, thereby improving the generalisat ity of the model to new data and helping to mitigate the phenomenon of overfitt calculation formula for this method is shown in Equation (3).
In the equation,   is the value obtained after normalisation;  repres data to be normalised;  min is the minimum value in the data; and  max is the m value in the data.
(2) Data Augmentation During the training process, the goal of TimeGAN is to generate time-series d are statistically similar to the real data.First, the embedding network transforms t time-series data into a low-dimensional representation that captures the intrinsic s and patterns of the data.The recovery network then reverses this process, conver low-dimensional representation back into the original time-series data, which h network learn the key features of the data.The generator then uses the mecha Generative Adversarial Networks to generate new time-series data to closely appr the distribution of the real data.Finally, the discriminator discriminates between erated time-series data and the real data, helping the generator to better simulate

The Proposed Model: SAResNet-TCN
Figure 4 depicts the SAResNet-TCN framework, followed by a sequential explanation of its fundamental stages.
(1) Data Preprocessing In the construction of deep learning models, the diversity of the dataset, which may include a variety of data types, can cause higher numerical features to have a more significant influence on the model, while the influence of features with lower numerical values is reduced.To address this issue, this study used the min-max normalisation method in the data preprocessing stage.This method effectively scales or transforms the data and reduces the scale differences between features, thereby improving the generalisation ability of the model to new data and helping to mitigate the phenomenon of overfitting.The calculation formula for this method is shown in Equation (3).
In the equation, x normal is the value obtained after normalisation; x represents the data to be normalised; x min is the minimum value in the data; and x max is the maximum value in the data.
(2) Data Augmentation During the training process, the goal of TimeGAN is to generate time-series data that are statistically similar to the real data.First, the embedding network transforms the input time-series data into a low-dimensional representation that captures the intrinsic structure and patterns of the data.The recovery network then reverses this process, converting the low-dimensional representation back into the original time-series data, which helps the network learn the key features of the data.The generator then uses the mechanism of Generative Adversarial Networks to generate new time-series data to closely approximate the distribution of the real data.Finally, the discriminator discriminates between the generated time-series data and the real data, helping the generator to better simulate the distribution of the real data. (

3) Feature extraction and fusion
This study used a combination of ResNet and a sparse attention mechanism to complete this crucial step.An attention branch parallel to ResNet was developed to improve the model's ability to capture salient features.ResNet consists of several layers of onedimensional networks, with the core structure consisting of convolutional, batch normalisation (BN), and pooling layers.The convolutional layers are responsible for local perception and feature extraction; the BN layers aim to accelerate the training process and improve model robustness; the pooling layers perform statistical operations on high-level features within the pooling region to output effective statistical features and achieve dimensionality reduction.The ResNet module contains two basic blocks, each consisting of successive convolutional layers, BN layers, and activation functions.The introduction of residual connections allows the network to learn complex feature representations more deeply.To further improve the model's focus on important features and reduce the interference of unimportant features, a sparse attention module was introduced to address the limitations of ResNet in distinguishing the importance of temporal features.Following the literature [34,35], an element-wise multiplication method was used to fuse the outputs of the ResNet and attention mechanism modules, and these fused features are subsequently used as inputs to the TCN module.

(4) Temporal Relationship Extraction
The model effectively captures the temporal dependencies between features through the TCN module and integrates them into the final prediction process.In this process, the input feature fusion results are first subjected to extended causal convolution operations, followed by processing through the ReLU activation function.To prevent model overfitting, a dropout layer is then introduced to randomly discard some nodes.The results of this process not only serve as input for further processing in the following diluted causal convolution, activation, and dropout layers but are also sent to a one-dimensional convolution layer for processing.The results of these two parts are linearly combined to form an output result with a residual connection, which is then sent to subsequent residual blocks for further computation.When the computation of all residual blocks is complete, the output of the residual blocks is combined with a fully connected layer and a softmax layer to produce the final prediction results.
(5) Output of prediction results In the previous processing, the data are normalised to eliminate the dimensional differences between different features.To obtain the final prediction results, the original scale of the predicted data is restored through a de-normalisation process, ensuring consistency with other relevant data and guaranteeing the interpretability and usability of the results.This step is an essential part of the entire prediction process and allows us to interpret and use the model output in a meaningful way.The calculation formula is shown in Equation ( 4).
where x L o is the predicted value after inverse normalisation; x L normal is the predicted value output from the model; and x L max and x L min denote the maximum and minimum values of the original data, respectively.
The choice of model parameters is closely related to the type of task and dataset.Based on other similar studies, this study experimentally determined the appropriate parameters for the model [36][37][38][39][40][41].The main parameter settings are presented in Table 1.The choice of model parameters is closely related to the type of task and dataset.Based on other similar studies, this study experimentally determined the appropriate parameters for the model [36][37][38][39][40][41].The main parameter settings are presented in Table 1.

Description of the Dataset
In the summer of 1956, the Prairie Grass Project was conducted, and the dataset it collected is still regarded as one of the most comprehensive in situ atmospheric dispersion datasets to date.It is capable of revealing patterns of hazardous gas diffusion.The experiment was conducted 5 miles northeast of O'Neill, Nebraska (42.49°N, 98.57° W).In the experiment, SO2 was released from a point source at a height of 0.46 m above the ground.

Description of the Dataset
In the summer of 1956, the Prairie Grass Project was conducted, and the dataset it collected is still regarded as one of the most comprehensive in situ atmospheric dispersion datasets to date.It is capable of revealing patterns of hazardous gas diffusion.The experiment was conducted 5 miles northeast of O'Neill, Nebraska (42.49• N, 98.57 • W).In the experiment, SO 2 was released from a point source at a height of 0.46 m above the ground.Air concentrations at a height of 1.5 m were sampled every 10 min at distances of 50, 100, 200, 400, and 800 m downwind.Given that the Prairie Grass Project's dataset accurately reflects the diffusion of toxic gases in real-world environments, it was selected as the optimal data source for the simulation experiments conducted in this study [42][43][44][45].The 68 sets of experimental data were divided into training and test sets, as shown in Table 2 (where Release x is the number of the experiment).The common monitoring parameters in the Prairie Grass Project are shown in Table 3.

Evaluation Indicators
To assess the model's performance from multiple perspectives, this study employed the root mean square error (RMSE), mean absolute error (MAE), and the coefficient of determination (R 2 ) as metrics, with Equations ( 5)-( 7) providing the formulas.The MAE measures the total error, while the RMSE focuses on the impact of significant errors.Both are non-negative values, with lower values indicating better agreement between the predicted and actual results.R 2 represents the adequacy of the model fit to the data points, with a range from 0 to 1.A higher value indicates a better model fit.
where N represents the total number of data points, and y exp i , y cal i , and y exp ave correspond to the experimental, calculated, and average values of the experiments, respectively.

Validation of Model Prediction Results
To qualitatively assess the accuracy of TimeGAN-synthesised data, this study adopted a comparative approach based on real data trends.Specifically, the similarity between real data, TimeGAN-synthesised data, and traditional GAN-synthesised data is determined using 1000 randomly selected training datasets as benchmarks.This comparison shows the effectiveness of the synthetic data in simulating the real data distribution, thus validating the performance of the TimeGAN model in the data generation task.The experimental results are shown in Figure 5.
adopted a comparative approach based on real data trends.Specifically, the similarity between real data, TimeGAN-synthesised data, and traditional GAN-synthesised data is determined using 1000 randomly selected training datasets as benchmarks.This comparison shows the effectiveness of the synthetic data in simulating the real data distribution, thus validating the performance of the TimeGAN model in the data generation task.The experimental results are shown in Figure 5.As can be seen in Figure 5, compared to the data synthesised by the traditional GAN, the dataset synthesised by TimeGAN showed a greater ability to simulate real data trends.While the traditional GAN typically works to capture the distributional characteristics of data, when dealing with time-dependent sequential data, this approach tends to ignore the dynamic correlations and temporal dependencies in the time series.TimeGAN was designed with this unique property of temporal data in mind.It learns and preserves the evolutionary patterns of data over time through embedded temporal correlation structures, producing synthetic data that not only resembles the original data in terms of static distribution but also more accurately reflects the characteristics of the original time-series data in terms of dynamic trends.This makes TimeGAN a more desirable choice for data analysis and predictive models that need to account for temporal dynamics.
To ensure the effectiveness of model training and to assess the quality of the data generated by TimeGAN, a 5-fold cross-validation method was used in this study.K-fold cross-validation is commonly used to evaluate the performance of deep learning models, especially when dealing with time-series data, where it is crucial to avoid random breaks in the data.This ensures that the continuity and integrity of the time-series data are maintained, allowing the model to effectively capture the dynamics of the time series.Using this approach, we compared the performance of models trained on raw data, TimeGANsynthesised data, and GAN-synthesised data under the same conditions to confirm the validity and reliability of the synthetic data.The experimental results are presented in Table 4.As can be seen in Figure 5, compared to the data synthesised by the traditional GAN, the dataset synthesised by TimeGAN showed a greater ability to simulate real data trends.While the traditional GAN typically works to capture the distributional characteristics of data, when dealing with time-dependent sequential data, this approach tends to ignore the dynamic correlations and temporal dependencies in the time series.TimeGAN was designed with this unique property of temporal data in mind.It learns and preserves the evolutionary patterns of data over time through embedded temporal correlation structures, producing synthetic data that not only resembles the original data in terms of static distribution but also more accurately reflects the characteristics of the original time-series data in terms of dynamic trends.This makes TimeGAN a more desirable choice for data analysis and predictive models that need to account for temporal dynamics.
To ensure the effectiveness of model training and to assess the quality of the data generated by TimeGAN, a 5-fold cross-validation method was used in this study.Kfold cross-validation is commonly used to evaluate the performance of deep learning models, especially when dealing with time-series data, where it is crucial to avoid random breaks in the data.This ensures that the continuity and integrity of the time-series data are maintained, allowing the model to effectively capture the dynamics of the time series.Using this approach, we compared the performance of models trained on raw data, TimeGANsynthesised data, and GAN-synthesised data under the same conditions to confirm the validity and reliability of the synthetic data.The experimental results are presented in Table 4.When evaluating the performance differences between the models trained on data synthesised by TimeGAN and those trained on original data, the research indicated that the data synthesised by TimeGAN demonstrate superior performance on three key metrics: the RMSE, MAE, and R 2 .Specifically, the models trained on TimeGAN-synthesised data showed a reduction in the RMSE from 19.12977 to 13.6521, the MAE from 9.9089 to 6.6283, and an improvement in R 2 from 0.8593 to 0.9697.This result demonstrates that TimeGAN-synthesised data perform exceptionally well in simulating the statistical characteristics and time dependencies of real datasets, allowing models to more accurately capture data patterns and, thus, improve the accuracy of predictive tasks.A further comparison between the models trained on TimeGAN-synthesised data and those trained on traditional GAN-synthesised data showed significant differences.The models trained on TimeGANsynthesised data outperformed those trained on traditional GAN-synthesised data in terms of the RMSE, MAE, and R 2 , with the former having lower RMSE and MAE values and an R 2 value closer to the ideal of 1.This difference in performance may be due to the specialised design of TimeGAN for time-series data, as it better captures the overall distribution and temporal correlations.In contrast, the traditional GAN may not fully capture the dynamic nature of time-series data, thereby affecting the final predictive performance.In conclusion, using TimeGAN for data augmentation provides a high-quality database for model training, which significantly improves the predictive power of the model.
Since the model complexity comes from the network structure, we used dropout regularisation to globally control the model complexity during model training in order to reduce the possibility of overfitting.The experimental results are shown in Table 5.In this study, the predictions of the model were validated by comparing them with real-world data to ensure the accuracy and usefulness of the predictions.Figure 6 shows the relationship between the predicted outputs of the model and real-world data.Through a comparison and validation with real-world data, we found that there was a high degree of consistency between the model's predicted output and the actual data.This not only indicates that the proposed model has a good predictive capability but also confirms that the model does not suffer from bias in the predicted output.In other words, the model can accurately capture the intrinsic trends of data and maintain a high degree of accuracy in its predictions.

Ablation Experiments
Ablation experiments are employed to investigate how the specific components of a model impact its performance by systematically removing them.This experimental approach provides readers with insights into the role that each component plays in the model's overall performance and helps validate the model's design rationale.
In the ablation experiments, the proposed model served as a baseline, and we evalu-

Ablation Experiments
Ablation experiments are employed to investigate how the specific components of a model impact its performance by systematically removing them.This experimental approach provides readers with insights into the role that each component plays in the model's overall performance and helps validate the model's design rationale.
In the ablation experiments, the proposed model served as a baseline, and we evaluated the performance gap between the model and the baseline model by excluding the components TimeGAN, SA, TCN, and ResNet, one by one.Figure 7 and Table 6 show the results of the ablation experiment.Compared to ablation experiment 2 (Abl exp.2), it is clear that the model without data augmentation did not perform well on the test set containing unseen data and, thus, it lacks a generalisation ability.As demonstrated by the results of Abl exp.3, the model's RMSE increased by over 25% upon the removal of the SA mechanism module.Figure 7b,c shows that removing the SA mechanism module significantly reduced the model's ability to predict peaks and troughs.This suggests that the SA mechanism helps the model to focus on crucial information within the sequence, thereby improving its accuracy in predicting abrupt changes.According to the results of Abl exp. 4 and Abl exp. 5, the TCN and ResNet modules have a significant impact on the performance of the overall model.ResNet has translational invariance, and TCN has temporal invariance.Combining the advantages of both can strengthen the model's ability to perceive spatial and temporal relationships.

Comparison Experiment
The performance of the proposed model was evaluated by comparing it with that of other models with different configurations.The comparison experiments revealed the limitations and room for improvement in the model design and helped in selecting the most appropriate model to achieve the desired results.
Figure 8 illustrates the prediction results of the model incorporating different attention mechanisms.It was observed that the model without an attention mechanism had the poorest performance.Compared to the use of multi-head attention (Mh_A) mechanisms, the SA mechanism demonstrated a remarkable improvement in the model's prediction results.The SA mechanism excels at homing in on important task-relevant information and precisely allocating attentional weights.This not only strengthens the model's performance and ability to generalise but also reduces the likelihood of overfitting.Although the attention mechanism may lengthen the model's training time, its significant advantages render it an efficient tool for enhancing prediction accuracy.The SA mechanism has the potential to decrease computational time while also expressing attention when compared to the use of Mh_A mechanisms in model prediction.The reduction in computation time provided by the SA mechanism not only increases its potential applicability in various scenarios but is also particularly important in the case of sudden toxic gas leaks.This advancement means that the relevant authorities can respond and carry out rescue operations more quickly, effectively reducing the impact of pollution incidents on human health and public safety.In the comparison experiment, we applied different models and combined them to determine the optimal model combination.The outcomes of the error comparison between the predictions of the proposed model and those of the SACNN-TCN, SABPNN-TCN, SAResNet-LSTM, and SAResNet-GRU models are shown in Figure 9.The results demonstrated a poor performance of the fusion model using a backpropagation neural network (BPNN), with an absolute error of more than 80 mg/m 3 in the peak value.This could be attributed to BPNN being based on gradient optimisation, which is susceptible to locally optimal solutions, and it is challenging for it to access the global optimal solution.In contrast to SACNN-TCN, the proposed model exhibited advantageous performance in forecasting.The peak absolute error of SACNN-TCN exceeded 60 mg/m 3 , whereas the proposed model's peak absolute error was less than 40 mg/m 3 .In addition, the proposed model demonstrated greater predictive stability, exhibiting less overall deviation from the true value.This advantage is attributable to the inclusion of residual concatenation, which enables ResNet to learn incremental changes in features, thereby capturing information in the input data more efficiently.It is evident that SAResNet-GRU, using long short-term memory (LSTM) and the gated recurrent unit (GRU), which are variations of the Recurrent Neural Network (RNN) used for forecasting, outperformed SAResNet-LSTM.This phenomenon can be ascribed to the reset gate mechanism within the GRU, which enhances the model's flexibility.This adaptability enables the GRU to better adapt to patterns in sequences with different time scales, leading to improved performance in predictive tasks.However, compared to the GRU, the TCN achieved more In the comparison experiment, we applied different models and combined them to determine the optimal model combination.The outcomes of the error comparison between the predictions of the proposed model and those of the SACNN-TCN, SABPNN-TCN, SAResNet-LSTM, and SAResNet-GRU models are shown in Figure 9.The results demonstrated a poor performance of the fusion model using a backpropagation neural network (BPNN), with an absolute error of more than 80 mg/m 3 in the peak value.This could be attributed to BPNN being based on gradient optimisation, which is susceptible to locally optimal solutions, and it is challenging for it to access the global optimal solution.In contrast to SACNN-TCN, the proposed model exhibited advantageous performance in forecasting.The peak absolute error of SACNN-TCN exceeded 60 mg/m 3 , whereas the proposed model's peak absolute error was less than 40 mg/m 3 .In addition, the proposed model demonstrated greater predictive stability, exhibiting less overall deviation from the true value.This advantage is attributable to the inclusion of residual concatenation, which enables ResNet to learn incremental changes in features, thereby capturing information in the input data more efficiently.It is evident that SAResNet-GRU, using long shortterm memory (LSTM) and the gated recurrent unit (GRU), which are variations of the Recurrent Neural Network (RNN) used for forecasting, outperformed SAResNet-LSTM.This phenomenon can be ascribed to the reset gate mechanism within the GRU, which enhances the model's flexibility.This adaptability enables the GRU to better adapt to patterns in sequences with different time scales, leading to improved performance in predictive tasks.However, compared to the GRU, the TCN achieved more efficient gradient propagation through convolutional operations.Therefore, the proposed model exhibited better stability and exceptional predictive capabilities.

GSA
Deep learning models are effective in various tasks but often lack interpretability, which causes several problems [46].Firstly, environmental managers and decision-makers may struggle to understand the model's reasoning, reducing their trust.Secondly, unclear predictions could lead to misinterpretations, particularly for critical tasks involving life and environment preservation, resulting in serious consequences.
GSA is critical when determining the impact of input parameters on a model's output, allowing researchers to identify parameters that require more precise measurement or careful adjustments.The Sobol method is a GSA approach that quantifies the contribution of each parameter to the variance in the total model output, either through changes in the parameter itself or through interactions with other parameters [46][47][48].The results of the Sobol method analysis are depicted in Figure 10.Wind speed, atmospheric stability class, and source strength (the mass of pollutant emitted from a source into the atmosphere per unit of time) were the most significant factors influencing the dispersion of pollutants.The last seven indicators, such as the directional standard deviation, were of relatively minor importance.For a better comprehension of how the input indicators impact the predictive performance of the model, an additive-by-addition approach was employed.This involved progressively adding the indicators based on their contribution in order to gauge their impact on the model's predictive capacity.The experimental results shown in Figure 11 clearly demonstrate that removing metrics with minimal contributions to model performance had a negligible impact on the accuracy of its predictions.Consequently, when resource limitations preclude access to all parameters, researchers should prioritise those that exert a pronounced influence on model output.In such resource-constrained scenarios, a high prediction accuracy can still be achieved by selecting key input parameters such as downwind distance, crosswind distance, source strength, source height, wind speed, wind direction, atmospheric stability class, and air temperature.These parameters are ideal for high-precision forecasting because they play a crucial role in simulating the dispersion of pollutants in the atmosphere.The results presented in Figure 8 demonstrate that focusing on the parameters that contribute most significantly to predictive accuracy is an effective strategy for maintaining the predictive performance of  Deep learning models are effective in various tasks but often lack interpretability, which causes several problems [46].Firstly, environmental managers and decision-makers may struggle to understand the model's reasoning, reducing their trust.Secondly, unclear predictions could lead to misinterpretations, particularly for critical tasks involving life and environment preservation, resulting in serious consequences.
GSA is critical when determining the impact of input parameters on a model's output, allowing researchers to identify parameters that require more precise measurement or careful adjustments.The Sobol method is a GSA approach that quantifies the contribution of each parameter to the variance in the total model output, either through changes in the parameter itself or through interactions with other parameters [46][47][48].The results of the Sobol method analysis are depicted in Figure 10.Wind speed, atmospheric stability class, and source strength (the mass of pollutant emitted from a source into the atmosphere per unit of time) were the most significant factors influencing the dispersion of pollutants.The last seven indicators, such as the directional standard deviation, were of relatively minor importance.For a better comprehension of how the input indicators impact the predictive performance of the model, an additive-by-addition approach was employed.This involved progressively adding the indicators based on their contribution in order to gauge their impact on the model's predictive capacity.The experimental results shown in Figure 11 clearly demonstrate that removing metrics with minimal contributions to model performance had a negligible impact on the accuracy of its predictions.Consequently, when resource limitations preclude access to all parameters, researchers should prioritise those that exert a pronounced influence on model output.In such resource-constrained scenarios, a high prediction accuracy can still be achieved by selecting key input parameters such as downwind distance, crosswind distance, source strength, source height, wind speed, wind direction, atmospheric stability class, and air temperature.These parameters are ideal for high-precision forecasting because they play a crucial role in simulating the dispersion of pollutants in the atmosphere.The results presented in Figure 8 demonstrate that focusing on the parameters that contribute most significantly to predictive accuracy is an effective strategy for maintaining the predictive performance of a model when not all parameters are available.

Application of Hazard Threshold Analysis
To further enhance the usefulness of this study, the maximum downwind hazardous distance (MDH-distance) based on different SO2 hazard thresholds was visualised.In particular, we used Release 31 as a case study and calculated the release concentrations

Application of Hazard Threshold Analysis
To further enhance the usefulness of this study, the maximum downwind hazardous distance (MDH-distance) based on different SO2 hazard thresholds was visualised.In particular, we used Release 31 as a case study and calculated the release concentrations

Application of Hazard Threshold Analysis
To further enhance the usefulness of this study, the maximum downwind hazardous distance (MDH-distance) based on different SO 2 hazard thresholds was visualised.In particular, we used Release 31 as a case study and calculated the release concentrations according to the six weather conditions specified in GBT37243-2019 [21].Figure 12 depicts the change in the MDH-distance over different hazard thresholds.An explanation of these hazard thresholds is given in Table 7.The Protective Action Criteria (PAC) were established by the Subcommittee on Consequence Assessment and Protective Actions (SCAPA).SCAPA is a subcommittee of the United States Department of Energy (DOE).The PAC serve as a tool for assessing the severity of the consequences of uncontrolled toxic releases.They help to assess the impact of such releases and guide the planning of appropriate responses.Immediately Dangerous to Life or Health (IDLH) limits deal specifically with high concentrations of hazardous substances that can endanger human life.This standard focuses primarily on the protection of workers in industrial environments and provides a basis for assessing and selecting appropriate protective equipment for use in the workplace.According to Figure 12, a distinct gap is evident in the MDH-distance for different conditions.Specifically, the MDH-distance reached its maximum value under conditions of atmospheric stability class F and a wind speed of 1.5 m/s.The MDH-distance achieved its lowest value under atmospheric stability class D conditions and a wind speed of 8.5 m/s.More specifically, at the PAC-1 level, the MDH-distance ranged from 822 to 1354 m, with a remarkable variance of 64.72%.As for the level of PAC-2, the MDH-distance range laid between 555 and 1204 m, with a relative difference as high as 116.94%.There was also a major difference between the MDH-distance ranges of PAC-3 and IDLH.According to the above findings, the MDH-distance is greater at low wind speeds and high atmospheric stability.The reason for this phenomenon lies in the fact that, when wind speeds are low and atmospheric stability is high, air movement is limited.This makes it challenging for pollutants to diffuse and dilute quickly.Consequently, the spread of contaminants on the downwind decelerates, increasing the likelihood of accumulation and consequently expanding the MDH-distance.Under similar wind speeds, increased atmospheric stability According to Figure 12, a distinct gap is evident in the MDH-distance for different conditions.Specifically, the MDH-distance reached its maximum value under conditions of atmospheric stability class F and a wind speed of 1.5 m/s.The MDH-distance achieved its lowest value under atmospheric stability class D conditions and a wind speed of 8.5 m/s.More specifically, at the PAC-1 level, the MDH-distance ranged from 822 to 1354 m, with a remarkable variance of 64.72%.As for the level of PAC-2, the MDH-distance range laid between 555 and 1204 m, with a relative difference as high as 116.94%.There was also a major difference between the MDH-distance ranges of PAC-3 and IDLH.According to the above findings, the MDH-distance is greater at low wind speeds and high atmospheric stability.The reason for this phenomenon lies in the fact that, when wind speeds are low and atmospheric stability is high, air movement is limited.This makes it challenging for pollutants to diffuse and dilute quickly.Consequently, the spread of contaminants on the downwind decelerates, increasing the likelihood of accumulation and consequently expanding the MDH-distance.Under similar wind speeds, increased atmospheric stability reduces updrafts, resulting in a slower vertical blending of pollutants.As a result, the dilution capacity of the pollutants is decreased, leading to an increase in the MDH-distance.This discovery highlights the impact of wind speed and atmospheric stability on pollutant dispersal and establishes a crucial scientific foundation for environmental regulation.It is vital to emphasise that exceeding the PAC-2 threshold endangers individuals' health and weakens their capacity for self-protection.Therefore, authorities need to issue real-time warnings based on the wind speed and atmospheric stability class and evacuate people in time to prevent severe safety incidents.In addition, ensuring a high level of safety is particularly important when undertaking rescue activities or emergency responses.Beyond the IDLH threshold, only highly reliable respiratory equipment is permitted to ensure that personnel are adequately protected in the performance of their duties.These safety measures are not only effective at reducing the risk of exposure to rescuers but also in ensuring the safety of the public.

Discussion
Deep learning models have demonstrated considerable potential in forecasting the dispersion of hazardous gases due to their capacity to adapt autonomously based on training data, thereby generating more reliable forecasts.However, the efficacy of these models is contingent upon the representativeness of the training data.In particular, when models are constructed using data from specific geographic regions, their applicability may be constrained in other areas with disparate terrains, seasons, or soil types.The dataset utilised in this study originated from a point-source continuous emission experiment, which was of limited duration and did not encompass seasonal variations or changes in geographic location.Consequently, the research investigated the impact of short-term environmental factors, such as wind speed, on the diffusion of hazardous gases.While factors such as seasonal changes and geographic location also affect the dispersion of hazardous gases, these variables were not extensively discussed in this study, given that the aim was to develop innovative deep learning models and validate their performance using data from the Prairie Grass Project.Overall, deep learning models demonstrate great potential in predicting the dispersion of hazardous gases.However, to advance deep learning models in the field of environmental sustainability, it is essential to explore and enhance their applicability and accuracy in diverse environments.

Conclusions
This paper presented exploratory research aimed at addressing the problem of predicting pollutant dispersion using deep learning techniques.The main contributions are outlined below.
1.The proposed model exhibited remarkable predictive efficacy.Ablation experiments were conducted to further substantiate the significant impact of key components, including TimeGAN, SA, TCN, and ResNet, on the overall model performance.Notably, the removal of the SA mechanism led to an increase of over 25% in the model's RMSE value, which serves to underscore the critical importance of the SA mechanism in enhancing the model's performance.Furthermore, the results of the comparative experiments demonstrated a clear advantage of the proposed model over other models with different configurations.This was confirmed by the superior performance of the proposed model in terms of the RMSE, MAE, and R 2 values, which were 3.6521, 6.6283, and 0.9697, respectively.
2. This study used GSA to improve the interpretability of the results.The results of the additive-by-addition method identified less significant indicators and the optimal combination of inputs for different scenarios.This provides a practical solution for forecasting

Figure 1 .
Figure 1.Diagram of basic modules of ResNet.

Figure 1 .
Figure 1.Diagram of basic modules of ResNet.

Figure 2 .
Figure 2. Diagram of basic modules of a TCN: (a) dilated causal convolution layer, (b) residual block.

Figure 2 .
Figure 2. Diagram of basic modules of a TCN: (a) dilated causal convolution layer, (b) residual block.

Figure 4 .
Figure 4.The main structure of the proposed model.

Figure 4 .
Figure 4.The main structure of the proposed model.

Figure 5 .
Figure 5. Validation of synthetic data quality with real data.

Figure 5 .
Figure 5. Validation of synthetic data quality with real data.

Figure 6 .
Figure 6.Validation of predictions with real-world data.

Sustainability 2024 ,
16,  x FOR PEER REVIEW 13 of 20 operations more quickly, effectively reducing the impact of pollution incidents on human health and public safety.

Figure 8 .
Figure 8.Comparison of the effects of attentional mechanisms on model performance.

Figure 8 .
Figure 8.Comparison of the effects of attentional mechanisms on model performance.

Figure 9 .
Figure 9.Comparison of errors for different model combinations.

Table 2 .
Division of training and test sets.

Table 4 .
Cross-validation to assess the quality of synthetic data.

Table 5 .
Improved model performance through regularisation.

Table 6 .
Results of the ablation experiment (w/o means without).
3.3.3.Comparison ExperimentThe performance of the proposed model was evaluated by comparing it with that of other models with different configurations.The comparison experiments revealed the lim-

Table 6 .
Results of the ablation experiment (w/o means without).

Table 7 .
Hazard thresholds for toxic gas SO 2 injuries.

Table 7 .
The Protective Action Criteria (PAC) were established by the Subcommittee on Consequence Assessment and Protective Actions (SCAPA).SCAPA is a subcommittee of the United States Department of Energy (DOE).The PAC serve as a tool for assessing the severity of the consequences of uncontrolled toxic releases.They help to assess the impact of such releases and guide the planning of appropriate responses.Immediately Dangerous to Life or Health (IDLH) limits deal specifically with high concentrations of hazardous substances that can endanger human life.This standard focuses primarily on the protection of workers in industrial environments and provides a basis for assessing and selecting appropriate protective equipment for use in the workplace.

Table 7 .
Hazard thresholds for toxic gas SO2 injuries.