Skip to Content
Applied SciencesApplied Sciences
  • Article
  • Open Access

12 March 2023

Predicting PM2.5, PM10, SO2, NO2, NO and CO Air Pollutant Values with Linear Regression in R Language

,
and
Technical Faculty “Mihajlo Pupin”, University of Novi Sad, 23000 Zrenjanin, Serbia
*
Author to whom correspondence should be addressed.

Abstract

Air pollution is one of the most challenging and complex problems of our time. This research presents the prediction of air pollutant values based on using an R program with linear regression. The research sample consists of obtained values of air pollutants such as sulphur dioxide (SO2), particulate matter (PM10, PM2.5), carbon monoxide (CO), nitrite oxides (NO, NO2, and NOX), atmospheric data pressure (p), temperature (T), and relative humidity (rh). The research data were collected from the city of Belgrade air quality monitoring reports, published by the Environmental Protection Agency of the Republic of Serbia. The report data were transformed into a form suitable for processing by the R program and used to derive prediction functions based on linear regression upon pairs of air pollutants. In this paper, we describe the R program that was created to enable the correlation of air pollutants with linear regression, which results in functions that are used for the prediction of pollutant values. The correlation of pollutants is presented graphically with diagrams created within the R GUI environment. The predicted data were categorized according to air pollution standard ranges. It has been shown that the derived functions from linear regression enable predictions that are well correlated with the data obtained by automatic acquisition from air quality monitoring stations. The R program was created by using R language statements without any additional packages, and, therefore, it is suitable for multiple uses in a diversity of application domains with minor adjustments to appropriate data sets.

1. Introduction

Emissions of dangerous gases into the atmosphere due to accidents, human activities, natural disasters, or other reasons are great threats to the human population, nature, and infrastructure. Air pollution is one of the significant environmental problems that can cause adverse health effects, such as asthma, allergies, infections [1], cancer [2], and the risk of low birth weight [3]. These health-related issues are correlated with air pollution, particularly in traffic [4,5]. Therefore, air pollutants data acquisition, measurements, monitoring, evaluation model formulation [6], assessment [7], benchmarking [8], and forecasting [7,9] became increasingly important, particularly in the circumstances of pandemics [10]. Many efforts have been made in the development of appropriate air pollution-related methods and tools, as well as their integration [11].
Data science software tools and modern programming languages enable working with large amounts of data related to environmental problems by utilizing specific functions and commands for data analysis, advanced reasoning, and visualization [12], with languages such as Python [13] and R [14]. R was developed as a computational environment to enable statistical data analysis by Ross Ihaka and Robert Gentleman from the University of Auckland, New Zealand [15]. R is a free and open-source programming language and software environment. The development and maintenance are assigned to the R Core Team, i.e., the R Foundation for Statistical Computing [16]. One of the most important features of R is its flexibility and scalability, i.e., the possibility to add a set of new functionalities to the base system. This is supported by tools for the development of additional packages [17], so, in this way, R could be used in a diverse set of applications, such as ecology [18].
The air pollution data analysis and the use of R in ecology were in focus in multiple previous research results [18,19,20,21,22,23,24,25]. However, the literature review of related work shows the research gap, which provides the basis for this research. Other related work has focused on particular air pollutants and their predictions [9,24,26,27,28,29,30,31,32,33,34], while this paper provides predictions for a variety of air pollutants. Other papers have used linear regression in air pollution prediction but not with R language [28,32,33,34,35,36,37,38,39]. In previously published papers, R was used for air pollution data analysis, but with other methods not with linear regression [23,24,25,40].
The aim of this research is related to using R for air pollution data correlation and prediction. R language statements are used for creating the program that performs data pre-processing with linear regression and establishes the mathematical model for the relationship between two variables. In this research, the dependent and independent variables are selected among air pollutant values for PM2.5, PM10, NO, NO2, NOX, CO, and SO2, as well as meteorological parameters–pressure, temperature, and relative humidity. The obtained mathematical models are used for air pollution data prediction. Results of this study include: (1) a detailed presentation of the developed R program for linear regression and prediction of air pollution data; (2) results of correlation of particular air pollutants, particularly presented with linear regression diagrams; (3) results of linear fitting, i.e., statistical evaluation of linear functions preciseness; and (4) prediction results based on previously obtained and evaluated mathematical functions that correlate air pollutant values.

3. Materials and Methods

Scientific data are often stored in formats not suitable for analysis and processing. Making applications that work with the diversity of data sources and growing databases has become an emerging topic since rapid data availability and processing could be a limiting factor for end-users [8]. The methodology and procedures used in this research include air pollution data collection, pre-processing with the use of the linear regression method, and processing in the prediction. Results of air pollutant values in this research are presented with the use of measurement units as mass units in the volume unit, taking them as related to time. These measurement units could be used over a long period of time for measuring concentrations of gases at shorter intervals [12]. Figure 1 presents a flowchart that visualizes the proposed method of using the R program within the air pollution prediction system. Data collection used in this research consists of air pollution data obtained from the official web site of the Environmental Protection Agency, affiliated with the Ministry of Environmental Protection, Belgrade, Republic of Serbia. Raw measurement data are presented at this website, since 2008, each hour of every day, obtained from automatic data acquisition stations that are located at different places, especially in cities (at crowded streets, industrial zones), but also at protected natural regions in the Republic of Serbia. Measurement data are presented comparatively for multiple measurement stations for each hour [46] or with a detailed data view for every measurement station, each day, and every hour [47].
Figure 1. Flowchart presenting the proposed method in an air pollution prediction system, based on the developed R program.
The sample data for this research was downloaded from [47]. It consists of data related to the January-March period in 2021 and 2022 and particularly selected for Serbia’s capital city, Belgrade. The observed period was from January to March in both years, since it was the winter period, when pollution is expected to be higher. Selecting a seasonal sample (winter) is aligned with the results of Zhao et al., where better accuracy of a mathematical model was created with linear regressions from seasonal data compared to annual data samples [32]. The City of Belgrade was selected as a very crowded urban area with large industrial and residential parts and heavy traffic. The sample consists of 378 measured air pollutant concentrations in 2021 and 567 measurements in 2022 for the following components: sulphur dioxide (SO2), particulate matter (PM10, PM2.5), carbon monoxide (CO), and nitrite oxides (NO, NO2, and NOX). All sample air pollutant values were obtained from the data source as raw data represented with the use of the measurement unit μg/m3, while CO values were represented with the mg/m3 measurement unit. In addition to these measurements, the sample data also contain the following meteorological conditions: air temperature (T, Celsius scale), relative humidity (rh, percentage), and atmospheric pressure (p, millibar measures).
The obtained sample data from the web site as a source was transformed into a tab-delimited text file, suitable for loading into the R GUI (Figure 2), with a data structure consisting of city/time/SO2/NO2/NOX/CO/NO/PM2.5/PM10/p/T/rh. The T symbol is regularly used for temperature, while in this research, the sample presents temperature with the symbol t. Data vectors, created from the loaded text file, are used to obtain a linear regression function that establishes the correlation of measured values of two air pollutants.
Figure 2. Air pollutants and meteorological measured values loaded into the R GUI.
The created R program enables the creation of linear regression functions (on all pairs of air pollutants and meteorological parameter values), graphical representation of data correlation in the form of colorized correlation heat maps and linear regression diagrams, as well as the use of linear regression functions and R language functions for the prediction of air pollution data. Finally, results of data processing (i.e., air pollutant value prediction) are organized according to categories of air pollutant concentrations and air quality: excellent, good, acceptable, polluted, very polluted, as defined by the Environmental Protection Agency [48].
The program used in this research was written using the R programming language within the R GUI (graphical user interface) editor, created by R Team (The R Foundation for Statistical Computing c/o Institute for Statistics and Mathematics, Vienna, Austria) [16]. Figure 3 and Figure 4 present the first and second parts of the created R program within the R GUI development environment, respectively.
Figure 3. Created the first part of the R program with creating data vectors, relations with linear regression, and drawing diagrams in the R GUI.
Figure 4. Created the second part of the R program, which deals with predicting air pollutants values, fitting the linear model, and printing the results.
The first command in the newly created R program is used for setting the working directory in R with the setwd(path) procedure. The second command is reading data from an external tab-delimited .txt file that includes the stored data about air pollutants and meteorological conditions measured values with the read.table() function. It creates a table data structure with a title row that contains names for each column. The third step is creating vectors for measured pollutant values from the previously created table with the “<-“ operator for assigning values to R structures:
vector <- datatable$columnname
In this program, we created vectors for all pollutants and measured temperature, pressure, and relative humidity.
Establishing relations with linear regression between pollutants was conducted with the “lm()” function in R (Figure 3):
lm (vectorpollutant2~vectorpollutant1), where vectorpollutant1 is x and vectorpollutant2 is y in the linear function y = ax + b.
Drawing diagrams for the obtained mathematical models was completed with three commands (Figure 3). First, we defined a .png image with a name:
png (file = “imagefilename.png”)
After this command, the plot function creates the diagram based on the results of the lm function, while the dev.off function saves the diagram in a file named with the png command and stored at the location defined by the setwd command.
Predicting air pollutant values is possible with a predictor vector, response vector, and linear regression function (Figure 4). Commands for this purpose are listed below:
linear_model <- lm (vectorpollutantA~vectorpollutantB,data = datatable)
variable_vectorpollutantB<- data.frame(vectorpollutantBprediction)
predict (linear_model,newdata = variable_vectorpollutantB)predict (linear_model,newdata = variable_vectorpollutantB, interval = ‘confidence’)
A linear model is created by applying the lm function to two datasets (data vectors), where the first parameter will have the y role and the second parameter data vector has the x role in the resulting linear function. The linear model is a mathematical function representing the data vectors correlation. The predict() function from the R language was used to predict the future values based on the previously created linear model and prediction input data vector. For the purpose of prediction, another data vector was prepared as an input data set of air pollutant values or meteorological data. Upon these data, as well as with the linear model, the predict() function of the R language was used to derive the predicted values. The next step was to execute the predict() function again, this time with the third parameter of the function call being used for checking the “confidence” level in predicted values, i.e., to enable the calculations of prediction accuracy, i.e., the preciseness of the mathematical model obtained with linear regression (Figure 4). Finally, the summary command in R provides statistics related to the obtained linear model as well as statistics related to prediction fitting (i.e., a statistical computation of the accuracy of the computed mathematical model that was used for prediction).

4. Results and Discussion

Comparing to previously published works, where R was used for other ecology-related research, air pollution data was processed with special R packages, and linear regression was used for other purposes, and the diversity of software tools and technologies, this work contributes with the detailed presentation of a specially created R program by using R Language statements, and this program enables data correlation with linear regression, fitting of the derived mathematical functions, and prediction of air pollution data.
Results of executing the R program in processing linear regression and prediction statistics are presented with one example for CO (dependent) and NO (independent) data vectors at Figure 5 and Figure 6.
Figure 5. Example of R program execution for linear regression upon CO and NO pairs or vectors and the resulting statistics regarding mathematical model accuracy.
Figure 6. Example of results from executing the predict function for CO data vectors based on correlation with NO (within the execution of the created R program).
From Figure 5, it could be concluded that the r (correlation coefficient) for CO values being predicted from NO values has a value of 0.8515, which could be categorized as a high level of correlation.
Results of executing the created R program (that includes utilization of the predict() R function) within the R GUI are shown in Figure 6. This is an example of successfully predicting future values of a CO pollutant based on the previously generated linear model of correlated CO and NO. Figure 6 presents three columns of data generated by the predict function for each item of the input data vector; there are computed values of prediction fit, but also lwr (lower) and upr (upper) values for each fit value. This way, it is obvious that the prediction function does not provide only one value, but an interval of values that could be expected in the future, using the underlying linear function as a basis.
For this particular case of correlation between CO and NO values, Figure 7 presents a diagram that enables comparison between the computed prediction data and the real measurement data for CO as being dependent on the NO values data set in the previously presented prediction function.
Figure 7. Diagram of comparatively presented real measurement data of CO and predicted values (range) of CO, based on linear correlation with NO values.
Figure 7 contains graphs representing predicted data for CO for each particular measurement of NO as a lower (min), fit (mid), and upper (max) predicted value, as well as a graph presenting CO measured values (at the same time and at the same monitoring station as the NO value that was used for prediction). Since the r correlation coefficient between NO and CO is 0.8515, it is obvious from Figure 7 that the measurements and prediction graphs for CO are not very closely aligned.
The correlation heat map, presented at Figure 8, shows values of r correlation coefficients computed upon pairs of data vectors for all obtained parameters, including air pollutants and meteorological parameters.
Figure 8. Correlation heat map with all obtained air pollution parameters from the sample.
According to the correlation heat map, the high correlation (r values from 0.9 to 0.7) was computed with pairs NO-NOx (r = 0.9809), CO-NOx (r = 0.8667), NO-CO (r = 0.8515), PM2.5-PM10 (r = 0.7895), NO2-NOx (r = 0.7558), while for moderate correlation (r values from 0.7 to 0.5), there are also six air pollutant pairs. It could be concluded that the computed correlation between meteorological parameters and air pollutants is very low, with an r value less than 0.2. Therefore, in the rest of this paper, these pairs of parameters will not be presented with linear regression diagrams, predictions, and detailed statistics. The linear equations (as results of the linear regression method) are presented for selected (from the high, moderate, and low correlation categories) pairs of air pollutants as bivariate graphical plots (Figure 9a–f) that describe their mutual dependence, i.e., inter-variable correlations. Dots represent data from vectors, while lines, plotted between dots are the graphical representation of a derived linear equation from the data in vectors. A detailed statistical analysis of the correlation between the model and the data, apart from the graphical representation (quantity of dots placed near the line), can also be conducted on the basis of the statistical data presented in Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6.
Figure 9. Linear regression diagrams created within R GUI with R program (a) PM10-PM2.5; (b) CO-NO; (c) SO2-NO2; (d) NO-NOX; (e) CO-NO2; (f) NO2-NOX.
Table 1. Summary statistics from linear regression model for PM2.5 and PM10.
Table 2. Summary statistics from linear regression model for CO and NO.
Table 3. Summary statistics from linear regression model for SO2 and NO2.
Table 4. Summary statistics from linear regression model for CO and NO2.
Table 5. Summary statistics from linear regression model for NO and NOX.
Table 6. Summary statistics from linear regression model for NO2 and NOX.
Standard statistical parameters such as Fisher’s criterion (F), correlation coefficient (r), estimate and residual standard error, p-value, and t-value were used for model validation. According to statistical parameters, it can be seen from Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6 that we have a moderate, high, and very high positive correlation with the model. The best correlation and fit to the data is with NO-NOX pollutants (r = 0.9809), which had a high F-value and low p-values and standard errors. The correlation between CO and NO pollutants was strong (r = 0.8515), which can be seen by statistical parameters and a graph. The very high correlation was in the case of PM2.5-PM10 (r = 0.7895) and NO2-NOX (r = 0.7558), as were the high F-value and low p-value and standard errors. CO-NO2 air pollution showed a moderate correlation with the model (r = 0.6305). The weakest interdependence was shown by SO2-NO2 pollutants (r = 0.4468), who had the lowest F-value and the greatest dispersion of data.
A good correlation between NO2, NO, and NOX (r~0.68 to 0.98) can be explained by the chemical similarity of the compounds and the chemical pathway of their formation [49].
N2 + O2 → 2NO,
NO + O → NO2,
NO + O3 → NO2 + O2,
According to the correlation heat map (Figure 8), there is a weak correlation (r < 0.5) between chemically different oxides of carbon, sulphur, and nitrogen. The specific case is CO, which establishes a high and medium correlation with all air pollutants, except with SO2 (r = 0.3349).
Table 7 shows the summarized and categorized results of predicting air pollutant values for SO2, PM10, PM2.5, CO, and NO2. Part of the data that is the source for Table 7 was previously presented in Figure 8. The categorization of program execution results (prediction data), which is presented in Table 7, has been performed according to air pollutant concentration indexes as excellent, good, acceptable, polluted, much polluted, and colored according to criteria defined by the Environmental Protection Agency of the Ministry of Environmental Protection, Republic of Serbia [48], which are aligned with the normative defined in the European Union. These values are, of course, different for each pollutant.
Table 7. Summarized and categorized results of predicting air pollutant values with R.
According to the results presented in Table 7, it could be concluded that the great majority of predicted values could be categorized as excellent or acceptable.

5. Conclusions

This paper contributes to the R program that enables data transformation, linear regression, mathematical model fitting, and the processing of prediction functions. This program also enables the graphical presentation of results in the form of diagrams that illustrate the correlation between pollutants. This research also contributes to the evaluation of linear regression functions for their accuracy. It has been shown that the resulting functions enable predictions with high precision; the predicted values correlate very well with the obtained data. The developed R program was created with R language statements without using specific R packages. Therefore, it could be used or easily adapted to diverse application domains. The second contribution is related to the results of data correlation with all types of air pollutants, which are included in most frequent air quality monitoring systems. It has been shown that certain air pollutant pairs of data have a high level of correlation since their linear regression function has a high level of fitting. The limitations of this work are related to several aspects. The R program was not presented entirely in this paper (because of the program’s length), but it is available at the public repository GitHub, and only the most important lines of R code are presented and explained. The second limitation is related to the sample characteristics: one city, two years, and only a winter period of three months in both years; not all atmospheric parameters were included (e.g., data from monitoring stations in this sample do not provide wind-related data). Future work could be related to the utilization of the created R program to make predictions based on a wider set of parameters, larger data sets taken over longer periods of time, a diversity of monitoring locations, adaptation to other application domains, and improving the program to use other statistical methods supported by the R language, with a special emphasis on supporting further chemical analysis of complex interactions and processes with gasses in the atmosphere.

Author Contributions

Conceptualization, Z.K. and L.K.; methodology, Z.K. and L.K.; software, Z.K.; validation, Z.K., S.F. and L.K.; formal analysis, Z.K. and S.F.; investigation, Z.K. and L.K.; resources, Z.K. and L.K.; data curation, Z.K. and S.F.; writing—original draft preparation, Z.K., S.F. and L.K.; writing—review and editing, Z.K., L.K. and S.F.; visualization, Z.K.; supervision, Z.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The R GUI tool is open source software that has been used in this research. It is available at https://www.r-project.org. The created R program is open-sourced and the code is available at https://github.com/AirPolWRL/APPWRL (accessed on 1 September 2022). Data for this research are obtained from http://www.amskv.sepa.gov.rs/pregledpodatakazbirni.php?lng=en (accessed on 1 January 2021). The datasets analyzed in this study are available in the repository: https://github.com/AirPolWRL/APPWRL (accessed on 1 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Brauer, M.; Hoek, G.; Smit, H.A.; De Jongste, J.C.; Gerritsen, J.; Postma, D.S.; Kerkhof, M.; Brunekreef, B. Air pollution and development of asthma, allergy and infections in a birth cohort. Eur. Respir. J. 2007, 5, 879–888. [Google Scholar] [CrossRef]
  2. Tusnio, N.; Fichna, J.; Nowakowski, P.; Tofilo, P. Air Pollution Associates with Cancer Incidences in Poland. Appl. Sci. 2020, 10, 7489. [Google Scholar] [CrossRef]
  3. Balogun, H.A.; Rantala, A.K.; Antikainen, H.; Siddika, N.; Amegah, A.K.; Ryti, N.R.I.; Kukkonen, J.; Sofiev, M.; Jaakkola, M.S.; Jaakkola, J.J.K. Effects of Air Pollution on the Risk of Low Birth Weight in a Cold Climate. Appl. Sci. 2020, 10, 6399. [Google Scholar] [CrossRef]
  4. McConnell, R.; Berhane, K.; Yao, L.; Jerrett, M.; Lurmann, F.; Gilliland, F.; Kunzli, N.; Gauderman, J.; Avol, E.; Thomas, D.; et al. Traffic, susceptibility, and childhood asthma. Environ. Health Persp. 2006, 114, 766–772. [Google Scholar] [CrossRef] [PubMed]
  5. Morgenstern, V.; Zutaver, A.; Cyrys, J.; Brockow, I.; Koletzko, S.; Kramer, U.; Behrendt, H.; Herbarth, O.; von Berg, A.; Bauer, P.C.; et al. Atopic diseases, allergic sensitization, and exposure to traffic-related air pollution in children. Am. J. Respir. Crit. Care Med. 2008, 177, 1331–1337. [Google Scholar] [CrossRef] [PubMed]
  6. Olvera-García, M.A.; Carbajal-Hernández, J.J.; Sánchez-Fernández, L.P.; Hernández-Bautista, I. Air quality assessment using a weighted Fuzzy Inference System. Ecol. Inform. 2016, 33, 57–74. [Google Scholar] [CrossRef]
  7. Morley, D.W.; Gulliver, J. A land use regression variable generation, modelling and prediction tool for air pollution exposure assessment. Environ. Modell. Softw. 2018, 105, 17–23. [Google Scholar] [CrossRef]
  8. Betancourt, C.; Hagemeier, B.; Schroder, S.; Schultz, M.G. Context aware benchmarking and tuning of a TByte-scale air quality database and web service. Earth Sci. Inform. 2021, 14, 1597–1607. [Google Scholar] [CrossRef]
  9. Rajat, R.R.; Vaibhav, D.; Ridam, G.; Rahul, P.; Pratik, G.; Mukul, S.; Ritik, J.; Preetee, K. Prediction of Air Quality Index Using Supervised Machine Learning. Int. J. Res. Appl. Sci. Eng. Tech. 2022, 10, 1371–1382. [Google Scholar]
  10. Xing, H.; Zhu, L.; Chen, B.; Niu, J.; Li, X.; Feng, Y.; Fang, W. Spatial and temporal changes analysis of air quality before and after the COVID-19 in Shandong Province, China. Earth Sci. Inform. 2022, 15, 863–876. [Google Scholar] [CrossRef]
  11. Carmichael, G.R.; Sandu, A.; Chai, T.; Daescu, D.N.; Constantinescu, E.M.; Tang, Y. Predicting air quality: Improvements through advanced methods to integrate models and measurements. J. Comput. Phys. 2008, 227, 3540–3571. [Google Scholar] [CrossRef]
  12. Ilijazi, V.; Jacimovski, S.; Milic, N.; Popovic, B. Software-Supported Visualization of Mathematical Spatial-Time Distribution Models of Air-Pollutant Emissions. J. Sci. Ind. Res. 2021, 80, 915–923. Available online: http://op.niscair.res.in/index.php/JSIR/article/view/46963/465479886 (accessed on 30 August 2022).
  13. Kadivala, A.; Kumar, A. Applications of Python to evaluate environmental data science problems. Environ. Prog. Sustain. 2017, 16, 1580–1586. [Google Scholar] [CrossRef]
  14. Dutang, C.; Goulet, V.; Pigeon, M. Actuar: An R package for actuarial science. J. Stat. Softw. 2008, 25, 1–37. [Google Scholar]
  15. Ihaka, R.; Gentleman, R. R: A Language for Data Analysis and Graphics. J. Comput. Graph. Stat. 2012, 5, 299–314. [Google Scholar]
  16. R Foundation for Statistical Computing. R Core Team. R: A Language and Environment for Statistical Computing. Available online: https://cran.r-project.org/doc/manuals/r-release/fullrefman.pdf (accessed on 7 September 2022).
  17. Csárdi, G.; Salmon, M. rhub: Connect to ‘R-hub’. Available online: https://r-hub.github.io/rhub/authors.html (accessed on 7 September 2022).
  18. Frichot, E.; Francois, O. LEA: An R package for landscape and ecological association studies. Methods Ecol. Evol. 2015, 6, 925–929. [Google Scholar] [CrossRef]
  19. Guenzi, D.; Fratianni, S.; Boraso, R.; Cremonini, R. CondMerg: An open source implementation in R language of conditional merging for weather radars and rain gauges observations. Earth Sci. Inform. 2017, 10, 127–135. [Google Scholar] [CrossRef]
  20. Kembel, S.W.; Cowan, P.D.; Helmus, M.R.; Cornwell, W.K.; Morlon, H.; Ackerly, D.D. Picante: R tools for integrating phylogenies and ecology. Bioinformatics 2010, 26, 1463–1464. [Google Scholar] [CrossRef]
  21. Stanke, H.; Finley, A.O.; Weed, A.S.; Walters, B.F.; Domke, G.M. rFIA: An R package for estimation of forest attributes with the US Forest Inventory and Analysis database. Environ. Modell. Softw. 2020, 127, 104664. [Google Scholar] [CrossRef]
  22. Lemenkova, P.; Debeir, O. R Libraries for Remote Sensing Data Classification by K-Means Clustering and NDVI Computation in Congo River Basin, DRC. Appl. Sci. 2022, 12, 12554. [Google Scholar] [CrossRef]
  23. Seo, J.Y.; Lee, H.M. A study on statistical map of air pollution in Korea using R. In Proceedings of the 4th International Conference on Computer Applications and Information Processing Technology CAIPT2017, Kuta Bali, Indonesia, 8–10 August 2017. [Google Scholar]
  24. Setiawan, I. Time series air quality forecasting with R Language and R Studio. J. Phys. Conf. Ser. 2020, 1450, 012064. [Google Scholar] [CrossRef]
  25. Carslaw, D.C.; Ropkins, K. openair—An R package for air quality data analysis. Environ. Modell. Softw. 2012, 27–28, 52–61. [Google Scholar] [CrossRef]
  26. Syafei, A.D.; Fujiwara, A.; Zhang, J. Prediction model of Air Pollutant Levels Using Linear Model with Component Analysis. Int. J. Environ. Sci. Dev. 2015, 6, 519–525. [Google Scholar] [CrossRef]
  27. Sethi, J.K.; Mittal, M. An efficient correlation based adaptive LASSO regression method for air quality index prediction. Earth Sci. Inform. 2021, 14, 1777–1786. [Google Scholar] [CrossRef]
  28. Zheng, Y.; Xiuwen, Y.; Ming, L.; Ruiyan, L.; Zhangping, S.; Eric, C.; Tiannui, L. Forecasting Fine-Grained Air Quality Based on Big Data. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 2267–2276. [Google Scholar]
  29. Siwek, K.; Osowski, S. Data Mining Methods for Prediction of Air Pollution. Int. J. Appl. Math. Comput. Sci. 2016, 26, 467–478. [Google Scholar] [CrossRef]
  30. Zhang, J.; Ding, W. Prediction of Air Pollutants Concentration Based on an Extreme Learning Machine: The Case of Hong Kong. Int. J. Environ. Res. Pub. He. 2017, 14, 114. [Google Scholar] [CrossRef] [PubMed]
  31. Ibarra-Berastegi, G.; Elias, A.; Barona, A.; Saenz, J.; Ezcurra, A.; Diaz de Argandona, J. From diagnosis to prognosis for forecasting air pollution using neural networks: Air pollution monitoring in Bilbao. Environ. Modell. Softw. 2008, 23, 622–637. [Google Scholar] [CrossRef]
  32. Zhao, R.; Gu, X.; Xne, B.; Zhang, J.; Ren, W. Short period PM2.5 prediction based on multivariate linear regression model. PLoS ONE 2018, 13, e0201011. [Google Scholar] [CrossRef] [PubMed]
  33. Choi, S.-M.; Choi, H. Statistical Modeling for PM10, PM2.5 and PM1 at Gangneung Affected by Local Meteorological Variables and PM10 and PM2.5 at Beijing for Non- and Dust Periods. Appl. Sci. 2021, 11, 11958. [Google Scholar] [CrossRef]
  34. Young, M.T.; Bechle, M.J.; Sampson, P.D.; Szpiro, A.A.; Marshall, J.D.; Sheppard, L.; Kaufman, J.D. Satellite-Based NO2 and Model Validation in a National Prediction Model Based on Universal Kriging and Land-Use Regression. Environ. Sci. Technol. 2016, 50, 3686–3694. [Google Scholar] [CrossRef]
  35. Mani, G.; Viswanadhapalli, J.K.; Stonier, A.A. Prediction and forecasting of air quality index in Chennai using regression and ARIMA time series models. J. Eng. Res. 2022, 10, 179–194. [Google Scholar] [CrossRef]
  36. Alsoltany, S.N.; Alnaqash, I.A. Estimating Fuzzy Linear Regression Model for Air Pollution Predictions in Baghdad City. J. Al-Nahrain Univ. 2015, 18, 157–166. [Google Scholar] [CrossRef]
  37. Roy, S.S.; Paraschiv, N.; Popa, M.; Lile, R.; Naktode, I. Prediction of air-pollutant concentrations using hybrid model of regression and genetic algorithm. J. Intell. Fuzzy Syst. 2020, 38, 5909–5919. [Google Scholar] [CrossRef]
  38. Sousa, S.I.V.; Martins, F.G.; Alvim-Ferraz, M.C.M.; Pereira, M.C. Multiple linear regression and artificial neural networks based on principal components to predict ozone concentrations. Environ. Modell. Softw. 2007, 22, 97–103. [Google Scholar] [CrossRef]
  39. Basagaña, X.; Aguilera, I.; Rivera, M.; Agis, D.; Foraster, M.; Marrugat, J.; Elosua, R.; Künzli, N. Measurement Error in Epidemiologic Studies of Air Pollution Based on Land-Use Regression Models. Am. J. Epidemiol. 2013, 178, 1342–1346. [Google Scholar] [CrossRef] [PubMed]
  40. Selvi, S.; Chandrasekaran, M. Performance evaluation of mathematical predictive modeling for air quality forecasting. Cluster. Comput. 2019, 22, 12481–12493. [Google Scholar] [CrossRef]
  41. Iskandaryan, D.; Ramos, F.; Trilles, S. Air Quality Prediction in Smart Cities Using Machine Learning Technologies Based on Sensor Data: A Review. Appl. Sci. 2020, 10, 2401. [Google Scholar] [CrossRef]
  42. Briggs, D.J.; Collins, S.; Elliot, P.; Fischer, P.; Kingham, S.; Lebret, E.; Pryl, K.; Van Reeuwijk, H.; Smallbone, K.; Van der Veen, A. Mapping urban air pollution using GIS: A regression-based approach. Int. J. Geogr. Inf. Sci. 1997, 11, 699–718. [Google Scholar] [CrossRef]
  43. Hochadel, M.; Heinrich, J.; Gehring, U.; Morgenstern, V.; Wichmann, H.E.; Kuhlbusch, T.; Link, E.; Kramer, U. Predicting long-term average concentrations of traffic-related air pollutants using GIS-based information. Atmos. Environ. 2006, 40, 542–553. [Google Scholar] [CrossRef]
  44. Zhou, X.; Tong, W.; Li, L. Deep learning spatiotemporal air pollution data in China using data fusion. Earth Sci. Inform. 2020, 13, 859–868. [Google Scholar] [CrossRef]
  45. Morandat, F.; Hill, B.; Osvald, L.; Vitek, J. Evaluating the Design of the R language. In ECOOP 2012—Object-Oriented Programming; Lecture Notes in Computer Science; Noble, J., Ed.; Springer: Berlin/Heidelberg, Germany, 2012; Volume 7313, pp. 104–131. [Google Scholar]
  46. Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Raw Data Obtained from Measuring Stations. Available online: http://www.amskv.sepa.gov.rs/stanicepodaci.php (accessed on 1 January 2021).
  47. Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Data View. Available online: http://www.amskv.sepa.gov.rs/pregledpodatakazbirni.php?lng=en (accessed on 1 January 2021).
  48. Environmental Protection Agency, Ministry of Environmental Protection, Republic of Serbia. National Network of Automatic Stations for Air Quality Monitoring—Criteria for Pollution Classification. Available online: http://www.amskv.sepa.gov.rs/kriterijumi.php?lng=en (accessed on 31 August 2022).
  49. Jacob-Lopes, E.; Queiroz Zepka, L.; Costa Deprá, M. Methods of evaluation of the environmental impact on the life cycle. In Sustainability Metrics and Indicators of Environmental Impact, Industrial and Agricultural Life Cycle Assessment; Elsevier: Amsterdam, The Netherlands, 2021; pp. 29–70. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.