Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Forecasting Air Pollutant Emissions Using Deep Sparse Transformer Networks: A Case Study of the Ekibastuz Coal-Fired Power Plant

Sustainability 2025, 17(11), 5115; https://doi.org/10.3390/su17115115

by Yurii Andrashko¹

, Oleksandr Kuchanskyi^2,3,4

, Andrii Biloshchytskyi^5,6,*, Alexandr Neftissov⁷

and Svitlana Biloshchytska^2,6,*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Sustainability 2025, 17(11), 5115; https://doi.org/10.3390/su17115115

Submission received: 2 May 2025 / Revised: 24 May 2025 / Accepted: 26 May 2025 / Published: 3 June 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

This paper primarily investigates how to predict and reduce emissions from the Ekibastuz coal-fired power plant in Kazakhstan, including sulfur dioxide (SO₂), PM2.5, and NOx. To achieve this goal, the authors first studied the long-memory characteristics in the time series data of these emissions through fractal analysis (Rescaled Range Analysis, R/S method). The study found that the data sequences exhibited long-memory features, indicating that past behavior has a sustained impact on the future. Based on this discovery, the researchers further utilized a deep sparse Transformer network model to predict the concentrations of these emissions. The model was proven to effectively handle interdependencies in long sequences of data, enabling accurate short-term predictions. However, the researchers also noted that the model's long-term predictive ability is limited, with prediction accuracy significantly declining beyond 12 time points.

Overall, the paper's topic has significant practical relevance, and the research methods are scientific and reasonable but still require further refinement of the model and application research.

The study only used three emission indicators for model validation, and the analysis results for other indicators were not presented in the paper. The completeness of more indicators and the quality of data need to be considered to improve the comprehensiveness of the model.
The paper mentions that the deep sparse Transformer network model is sensitive to data missing and noise. A detailed analysis of the model's requirements for data quality is needed, along with providing methods for data preprocessing, to enhance the model's stability and robustness.
The paper mentions a Convolutional Neural Network-Long Short-Term Memory-Attention model but does not provide a performance comparison between the two models. A more comprehensive comparative analysis is suggested to demonstrate the advantages and limitations of the proposed model.
The paper states that the generalization of the model may be affected by the specific operations and equipment of coal-fired power plants. An exploration of the model's generalization ability and its potential application in different coal-fired power plants is needed.

Author Response

1. The study only used three emission indicators for model validation, and the analysis results for other indicators were not presented in the paper. The completeness of more indicators and the quality of data need to be considered to improve the comprehensiveness of the model.

2. The paper mentions that the deep sparse Transformer network model is sensitive to data missing and noise. A detailed analysis of the model's requirements for data quality is needed, along with providing methods for data preprocessing, to enhance the model's stability and robustness.

The dataset was analyzed for omissions, and it was found to contain a small number of omissions from 2 to 4, but these omissions are large (from 124 to 1560 points). This makes applying the interpolation method to fill in the omissions and implement the DSTN model and the R/S analysis method difficult. If the input dataset contains gaps of small length, this means that filling in the gaps is possible using special interpolation methods. Therefore, in our case, it was decided to exclude parts of the time series containing gaps of large length from consideration. Using the R/S analysis method and the DSTN model for time series with available gaps of this length makes no sense. As part of the study, statistical emissions in the sample were analyzed. For this purpose, three times the standard deviation (3σ) From the mean value is used, one of the classical approaches to detecting anomalous values. As a result, 158 emissions of SO2 indicators, 346 emissions of NOx indicators and no emissions for PM2.5 indicators were identified. This is 0.254% of the total data volume. At the same time, none of the detected values exceeds the threshold value of five standard deviations (5σ), which indicates the absence of extreme anomalies. Given the low proportion of emissions and their limited distance from the average, it was decided not to remove these observations from further analysis, since their impact on the overall data structure is insignificant.

We have added this information to the article.

3. The paper mentions a Convolutional Neural Network-Long Short-Term Memory-Attention model but does not provide a performance comparison between the two models. A more comprehensive comparative analysis is suggested to demonstrate the advantages and limitations of the proposed model.

Thank you for your recommendation. The mention of the CNN–LSTM–Attention model in the literature review was intended to highlight popular models used to predict pollutant emissions. However, implementing such a model is not part of the terms of reference for a research project that funds research. Also, implementing the model requires a relatively large amount of time. Accordingly, its implementation will be carried out in future studies to compare the results obtained.

4. The paper states that the generalization of the model may be affected by the specific operations and equipment of coal-fired power plants. An exploration of the model's generalization ability and its potential application in different coal-fired power plants is needed.

Thank you for your comments. Thus, the article states that the specifics of a particular power plant's equipment and operating modes may limit the model's generalization. However, the project team can access data from one coal-fired power plant. However, this power plant is of strategic importance for the sustainability of the entire region's energy system. In other words, it is the largest in this region and one of the largest in the world. Studies confirm that the developed DSTN model can be used in the operation of other coal-fired electric stations, but it may need to be finalized. This should be studied separately.

Reviewer 2 Report

Comments and Suggestions for Authors

Attached please find the comments.

Comments for author File: Comments.pdf

Author Response

1. The paper mentions missing values in the dataset but lacks details on their proportion and handling methods (e.g., deletion/imputation). Please clarify how data integrity was ensured and validate the impact of missing value treatment on model performance.

Thank you for your comments. The dataset was analyzed for omissions, and it was found to contain a small number of omissions from 2 to 4, but these omissions are large (from 124 to 1560 points). This makes applying the interpolation method to fill in the omissions and implement the DSTN model and the R/S analysis method difficult. If the input dataset contains gaps of small length, this means that filling in the gaps is possible using special interpolation methods. Therefore, in our case, it was decided to exclude parts of the time series containing gaps of large length from consideration. Using the R/s analysis method and the DSTN model for time series with available gaps of this length makes no sense.

As part of the study, statistical emissions in the sample were analyzed. For this purpose, three times the standard deviation (3σ) From the mean value is used, one of the classical approaches to detecting anomalous values. As a result, 158 emissions of SO2 indicators, 346 emissions of NOx indicators and no emissions for PM2.5 indicators were identified. This is 0.254% of the total data volume. At the same time, none of the detected values exceeds the threshold value of five standard deviations (5σ), which indicates the absence of extreme anomalies. Given the low proportion of emissions and their limited distance from the average, it was decided not to remove these observations from further analysis, since their impact on the overall data structure is insignificant.

We have added this information to the article.

2. The validation relies solely on data from one power plant. How can the model's generalizability to other coal-fired plants or pollutant types be demonstrated?

Thank you for your recommendation. SO₂, NOx, and PM2.5 scores were selected as priorities due to their high toxicity and high follow-up frequency in the baseline dataset. In addition, these substances are the main markers of pollution from coal burning in accordance with the recommendations of WHO and the European Environmental Protection Agency and are described in the relevant regulatory acts. Other air pollution emissions were also analyzed during the study, but there were no significant discrepancies with the presented results regarding the values of the Hurst exponent and the accuracy of the DSTN model. Therefore, SO2, NOx, and PM2.5 indicators were selected in the study description. We have added this information to the article. Thus, the article states that the specifics of a particular power plant's equipment and operating modes may limit the model's generalization. However, the project team can access data from one coal-fired power plant. However, this power plant is of strategic importance for the sustainability of the entire region's energy system. In other words, it is the largest in this region and one of the largest in the world. Studies confirm that the developed DSTN model can be used in the operation of other coal-fired electric stations, but it may need to be finalized. This should be studied separately.

3. The reliability of R/S analysis for non-stationary time series is debated. Were alternative methods (e.g., MF-DFA) used to confirm the robustness of long-term dependency conclusions?

In addition to R/S analysis, we implemented the DFA method, allowing us to confirm long-term dependencies in time series. Both methods yielded similar results for the Hurst exponent, which indicates the reliability of the data structure conclusions. Since R/S analysis is easier to implement and the input time series is cleared of omissions, this study decided to focus on its implementation. We have added this information to the article.

4. The "black-box" nature of DSTN hinders actionable insights for parameter optimization. Were attention weights analyzed to identify critical operational parameters affecting emissions?

Thank you for your recommendation. No weight analysis was performed. However, we understand that identifying key technological parameters that affect emissions is an important task for the monitoring system and this study will be conducted in the future.

5. Performance comparisons with classical models (e.g., LSTM, ARIMA) or other Transformer variants are absent. Please add benchmark experiments to justify the model's superiority

Thank you for your comments. The implementation of these models is not part of the terms of reference of the research project that funds the research. However, the ARIMA model was implemented to confirm the results, which are shown in the updated version of the manuscript.

6. The model's 2-hour training time and OOM errors raise concerns about real-time deployment. How will computational efficiency be improved (e.g., model lightweighting or edge computing)?

Thank you for your comments. When implementing the decoder part of the DSTN model, a window with a limited length of 20 points was used. If you have enough computing power, you can use a longer window to provide higher forecast accuracy for values with a horizon of more than 12 points. No other measures were taken to improve computational efficiency in this study.

7. The references should be expanded. Some new literatures might be help the authors to further deepen the understanding of reaction mechanism as well as newest developing in this field.

Thank you for your recommendation. We have adjusted the list of references.

Reviewer 3 Report

Comments and Suggestions for Authors

This study proposes a novel application of Deep Sparse Transformer Networks (DSTN) for forecasting air pollutant emissions (SO2, PM2.5, and NOx) from a major coal-fired power station in Kazakhstan. The authors' goal is to improve environmental monitoring and industrial management by modeling long-term relationships in emission time series using deep learning methods.

The paper is well-organized, with a precise methodology that clearly highlights the research's environmental relevance. However, several aspects need to be improved:

1- The literature review introduces important research but does not provide a critical comparison of models to other deep learning architectures for environmental forecasting. It is advised that the authors conduct a comparison examination of existing models (e.g., LSTM, CNN, XGBoost) in terms of prediction performance and computational efficiency.

2- The focus on SO2, NOx, and PM2.5 is scientifically valid, however, the selection reason is lacking. It is advised that the authors reinforce their choice more clearly.

3- The DSTN architecture is described, however many details, such as the number of layers, precise hyperparameters, and training-validation split, are either absent or just briefly mentioned.

4- The study adopts data from a single power plant and does not test the model against additional datasets or sites. To improve generalizability, the authors should explore verifying the model with data from a different place, or at the very least simulating such a scenario to evaluate adaptability.

5- The conclusion highlights technological success without delving deeper into societal or policy implications. The authors should consider discussing how such forecasting tools could be integrated into regulatory frameworks, pollution trading schemes, or public health warning systems.

Author Response

1. The references should be expanded. Some new literatures might be help the authors to further deepen the understanding of reaction mechanism as well as newest developing in this field.

Thank you for your comments. The ARIMA model was implemented to confirm the results, which are shown in the updated version of the manuscript.

2. The focus on SO2, NOx, and PM2.5 is scientifically valid, however, the selection reason is lacking. It is advised that the authors reinforce their choice more clearly.

SO₂, NOx, and PM2.5 scores were selected as priorities due to their high toxicity and high follow-up frequency in the baseline dataset. In addition, these substances are the main markers of pollution from coal burning in accordance with the recommendations of WHO and the European Environmental Protection Agency and are described in the relevant regulatory acts. Other air pollution emissions were also analyzed during the study, but there were no significant discrepancies with the presented results regarding the values of the Hurst exponent and the accuracy of the DSTN model. Therefore, SO2, NOx, and PM2.5 indicators were selected in the study description. We have added this information to the article.

3. The DSTN architecture is described, however many details, such as the number of layers, precise hyperparameters, and training-validation split, are either absent or just briefly mentioned.

The DSTN architecture is described, however many details, such as the number of layers, precise hyperparameters, and training-validation split, are either absent or just briefly mentioned.

In the methodology section, a detailed description of the DSTN architecture has been added: two encoders and two DSTN block decoders, each with a attention window w=20, embedding dimension = 64, learning rate = 0.001, Adam optimizer, batch size = 64, train/test distribution = 80/20. The training lasted 2 hours on NVIDIA RTX 3060. We have added this information to the article.

4. The study adopts data from a single power plant and does not test the model against additional datasets or sites. To improve generalizability, the authors should explore verifying the model with data from a different place, or at the very least simulating such a scenario to evaluate adaptability.

Thus, the article states that the specifics of a particular power plant's equipment and operating modes may limit the model's generalization. However, the project team can access data from one coal-fired power plant. However, this power plant is of strategic importance for the sustainability of the entire region's energy system. In other words, it is the largest in this region and one of the largest in the world. Studies confirm that the developed DSTN model can be used in the operation of other coal-fired electric stations, but it may need to be finalized. This should be studied separately.

5. The conclusion highlights technological success without delving deeper into societal or policy implications. The authors should consider discussing how such forecasting tools could be integrated into regulatory frameworks, pollution trading schemes, or public health warning systems.

Thank you for your comments. We have added this information to the article. The results of the study correspond to the goals of sustainable development: SDG 3 (health and well-being), SDG 7 (affordable and clean energy), SDG 11 (sustainable cities), SDG 13 (combating climate change). In particular, the introduction of predictive environmental monitoring contributes to reducing emissions, protecting public health, and developing sustainable regional management strategies.

Round 2

Reviewer 2 Report

Comments and Suggestions for Authors

I am very satisfied with the updated and revised version of the manuscript. The authors have greatly improved the quality of the data presented and have addressed all my comments and concerns. I therefore recommend the publication of this research study in its present form.

Article Menu

Forecasting Air Pollutant Emissions Using Deep Sparse Transformer Networks: A Case Study of the Ekibastuz Coal-Fired Power Plant

Further Information

Guidelines

MDPI Initiatives

Follow MDPI