Statistical Approaches for Forecasting Primary Air Pollutants: A Review

: Air pollutant forecasting can be used to quantitatively estimate pollutant reduction trends. Combining bibliometrics with the evolutionary tree and Markov chain methods can achieve a superior quantitative analysis of research hotspots and trends. In this work, we adopted a bibliometric method to review the research status of statistical prediction methods for air pollution, used evolutionary trees to analyze the development trend of such research, and applied the Markov chain to predict future research trends for major air pollutants. The results indicate that papers mainly focused on the effects of air pollution on human diseases, urban pollution exposure models, and land use regression (LUR) methods. Particulate matter (PM), nitrogen oxides (NOx), and ozone (O 3 ) were the most investigated pollutants. Artiﬁcial neural network (ANN) methods were preferred in studies of PM and O 3 , while LUR were more widely used in studies of NOx. Additionally, multi-method hybrid techniques gradually became the most widely used approach between 2010 and 2018. In the future, the statistical prediction of air pollution is expected to be based on a mixed method to simultaneously predict multiple pollutants, and the interaction between pollutants will be the most challenging aspect of research on air pollution prediction. The research results summarized in this paper provide technical support for the accurate prediction of atmospheric pollution and the emergency management of regional air quality.


Introduction
Air pollution not only has direct impacts on animals, plants, and human health but also has indirect negative effects on ecosystems and material circulation. Scholars have found that the non-accidental mortality of patients with cardiovascular and respiratory diseases is closely related to the concentration of particulate matter (PM) in the atmosphere [1], and an increase in black carbon levels can increase the mortality of coronary heart disease [2]. Moreover, mountain fires, the ozone hole, and global warming are considered relevant to air quality deterioration [3,4]. To better understand and manage the risks associated with air pollution, an accurate prediction of the trend of air pollution is crucial.
Many methods have been developed for the prediction of air pollution. These can be roughly divided into three categories: deterministic models, traditional statistical methods, and artificial intelligence (AI) methods. Deterministic models have been developed into third-generation air quality models based on the "single atmosphere". This kind of model can simulate physical and chemical atmospheric processes at various three-dimensional scales. The representative deterministic models are the Community Multiscale Air Quality modeling system (CMAQ), the Atmospheric Dispersion Modeling System (ADMS) model, and the Weather Research and Forecasting model coupled with Chemistry (WRF-Chem) model [5]. Deterministic models have allowed a series of important advances in the field of air quality prediction; for example, Hu's study demonstrated the ability of CMAQ to reproduce severe air pollution in China [6]; Mathur extended CMAQ to simulate the distribution of ozone and PM throughout the northern hemisphere [7]; and Rafee applied the WRF-Chem model to evaluate the contribution of mobile, fixed, and biological sources to air pollution in the Amazon rainforest, with the results showing that the air pollution plume from the city of Manaus was mainly transported to the west and southwest [8]. In view of the fact that air pollution forecasting has received increased attention from research communities and governments, the World Meteorological Organization published a classic study on training material and best practice regarding the use of 3D chemical weather and air quality forecasting (CW-AQF) models for operational forecasting, early warning, and policymaking [9]. If the use of statistical methods can be further increased, better results may be achieved in terms of the prediction accuracy and cost-performance ratio of calculation methods [10].
Statistical methods have been developed from traditional multiple linear regression methods. With the development of computer science and the continuous improvement and innovation of statistical prediction methods, traditional regression methods and spatial statistical methods have been combined into a complex analysis method. Artificial intelligence methods-which include machine learning methods such as artificial neural networks (ANNs), deep learning (DL), and support vector machines (SVMs), and which have data adaptation and data-driving at their core-have emerged in recent years. Many studies have shown that the accuracy of artificial intelligence technology is superior to that of traditional statistical methods [11].
Some previous studies have reviewed these methods that are used to predict air pollution. For instance, Bai et al. [12] introduced detailed statistical methods for the prediction of air pollution. They listed the basic principles of these methods but focused less on the advantages, disadvantages, and application effects of the methods. Rybarczyk and Zalakeviciute [10] analyzed 46 papers related to machine learning and concluded that researchers preferred to use integrated learning and regression for estimation applications but tended to use neural networks and support vector machines for prediction applications. However, due to the limited number of papers that were considered, this review could only represent the research status at that time and failed to provide clues on the research inclination and tendency of prediction methods for different pollutants. Therefore, it is necessary to perform a quantitative analytical review of a large number of studies involving air pollution prediction to help researchers in this field to locate the research focus more quickly and accurately.
Bibliometrics is an interdisciplinary discipline that uses mathematical and statistical methods to quantitatively analyze all knowledge carriers. Bibliometrics has been applied to perform literature reviews on environmental topics, such as urban heat islands, air pollution source analysis, and the impact of air pollution on human health [13,14]. Beyond obtaining background knowledge of the research topic, bibliometrics helps to elucidate the connections between different literature subjects. However, the identification of the future prospects of research topics requires the careful interpretation of the results of bibliometric analysis and the visualization of technology. Therefore, it is necessary to use another quantitative analysis method to analyze the evolution mechanism of spatiotemporal data. Evolutionary trees can be a good solution to this problem. When applied in biology, evolutionary trees are drawn based on the distance of the biological genetic relationship and organisms are placed on the branches of a tree chart. These diagrams concisely display the evolutionary processes of, and genetic relationships between, organisms. Additionally, these features also make evolutionary trees an effective tool for spatiotemporal data analysis. Graphical Phylogenetic Analysis (GraPhlAn) is used in biocoenology and genetics research [15]. Additionally, another quantitative analysis method is needed to determine the development speed of different academic fields. The Markov chain method can quantitatively show the process of urban development and transformation [16].
In this paper, we use bibliometrics to quantitatively review the research status of statistical methods for the prediction of air pollution in recent years. We firstly analyze all collected papers in terms of the number of publications, journal types, and subject categories to obtain an overall understanding of the development status of this discipline. Then, according to the cooperation among countries, authors, and institutions, we identify research gaps and cooperative relationships within and outside of China. An evolutionary tree is drawn to analyze the method development trend of major pollutants and statistical prediction methods in different time periods. Finally, the Markov chain method is applied to quantitatively predict the development trend of research on the prediction of major air pollutants.

Data Sources and Preprocessing
Based on the Web of Science core database, which was developed by the American Institute of Science and Information, a search was performed for articles and reviews with the keywords "air pollution" and "prediction"; this returned 5437 related studies (Table 1), and the full records of these were downloaded. The Microsoft Excel and BibExcel software was used to extract the title (TI), abstract (AB), author (AU), author information (CI), publication time (PY), subject category (CS), published journal (SO), keywords (DE), and references (CR) of each article. To identify the research objects and prediction methods, the titles, abstracts, and keywords of all the articles were studied. Finally, a total of 727 articles were selected that used statistical prediction methods to study PM, ozone (O 3 ), nitrogen oxides (NOx), the air quality index (AQI), and composite pollutants.

Web of Science Core Collection
Retrieval method TS = (("air pollutants" OR "air pollution" OR "atmospheric pollutants" OR "atmospheric pollutant") AND (Predict OR Prediction OR Forecast OR Forecasts)) Timespan 1990-2018 Document type Articles and reviews

Bibliometric Analysis
Bibliometrics is a scientific analysis method that uses mathematical and statistical methods to quantitatively analyze the quantitative relationship, distribution, and changing rule of the literature on a given topic. The full record of the published literature (books, journals, newspapers, conference articles, etc.) is fundamental for obtaining the data structure needed for bibliometric research. Through mathematical statistical analysis, the rule of article publishing, research trends, institutional cooperation, etc., can be quantitatively analyzed. In the bibliometric analysis performed in this study, the first step was extracting and counting the features of the subject categories, publishing journals, and publication dates. This information can provide the basic situation of research on air pollution prediction. Then, VOSviewer, a software program used for constructing and visualizing bibliometric networks, was used to analyze cooperation, co-occurrence, and co-citation. By extracting research institutions (top 120 with a frequency of more than five times), keywords (top 100 with a frequency of more than 10 times), and references (top 120 with a frequency of more than 20 times), the cooperation network of research institutions, the keyword co-occurrence network, and the literature co-citation network were drawn. The connections within these networks depict the structure, evolution, and cooperative relationships of the research field of air pollution prediction.

Evolutionary Tree Analysis
In this study, the evolutionary tree analysis was divided into two parts: one involving an evolutionary tree and one involving Markov chain analysis. In biology, the evolutionary tree method is used to represent evolutionary relationships among species. By placing species on a tree chart with branches, the evolutionary processes of, and genetic relationships between, the species can be displayed concisely. The evolutionary tree was drawn using GraPhlAn, a Python-based command-line tree-drawing tool developed by the Huttentower Laboratory. This tool is frequently applied in phylogenic and taxonomic research and can directly and concisely display the systematic classification structure in the form of a circular tree. In this study, borrowing the idea of taxonomy from biology, the 727 articles were classified into three levels: the first level represents the pollutant object studied in the article; the second level represents the statistical prediction method used in the article; and the third level represents each individual article (see Table A1 in Appendix A for details). The GraPhlAn tool was used to draw evolutionary trees for articles published in 1990-1999, 2000-2009, and 2010-2018. The Markov chain method is used to describe the transition process from one state to another [17]. This transition needs to meet the requirement of being "memoryless"that is, the distribution probability of the next state can only be determined by the current state. In this paper, we group papers that study the same pollutant and use the same prediction method into one category and define five levels based on the number of publications: Level 1: 0-1 papers; Level 2: 2-5 papers; Level 3: 6-10 papers; Level 4: 11-30 papers; and Level 5: more than 31 papers. The Geotree software was applied to draw the Markov chain from 2000-2009 to 2010-2018. Then, the changes in development status were quantitatively analyzed for each level in the two above periods to identify research hotspots.

Basic Information
Between 1990 and 1999, the number of published studies regarding air pollutant prediction increased from 6 to 75, with an average annual growth of 6.9 studies; between 2000 and 2009, the number of published studies rose from 122 to 283, with an average annual growth of 16.1 studies; and between 2010 and 2018, the number of published studies grew from 230 to 597, with an average annual growth of 36.7 studies. In the following, this study will discuss the bibliometric results for these three periods one by one ( Figure 1). Regarding journals, "Atmospheric Environment" occupied first place in terms of publication number in all of the three periods, accounting for 11.72% of the total number of publications (Table A2). The journal "Atmospheric Chemistry and Physics" entered the top 10 in the second period and rose to second place in the third period. This indicates a significant growth in the influence of this journal from 2010 to 2018. In the third period,  By analyzing the top 10 disciplines producing papers involving air pollution prediction in the three periods, "Environmental Sciences & Ecology" is always the core (Table A1). In all of the three periods, the number of studies published in the discipline of "Environmental Sciences & Ecology" is basically consistent with the growth trend of the total number of papers involving air pollutant prediction. The number of papers accounts for 58.67% of the total number of published papers, and the proportion over the years is more than half ( Figure 1). In the three periods, the disciplines of "Meteorology & Atmospheric Sciences", "Engineering", and "Public, Environmental & Occupational Health" also rank at the top of studies involving air pollutant research. In all of the three periods, "Environmental Sciences & Ecology", together with the above three disciplines, was always in the top four. The discipline of "Public, Environmental & Occupational Health" developed rapidly in the third period; although it ranked fourth in this period, the number of papers published in this discipline was close to that of the third-ranked discipline. In 2000-2009, the discipline of "Chemistry" began to enter the top 10 disciplines; additionally, it kept increasing in the third period, which indicated a growing research focus on the relationship between air pollutant prediction and chemistry.
Regarding journals, "Atmospheric Environment" occupied first place in terms of publication number in all of the three periods, accounting for 11.72% of the total number of publications (Table A2). The journal "Atmospheric Chemistry and Physics" entered the top 10 in the second period and rose to second place in the third period. This indicates a significant growth in the influence of this journal from 2010 to 2018. In the third period, "Atmospheric Pollution Research", "Environmental Research", and "Aerosol and Air Quality Research" also developed rapidly and entered the top 10 journals publishing studies involving air pollutant prediction.

Analysis of Research Institutions and Co-Citation
Throughout 1990-2018, the research institutions that were involved in studies of air pollution prediction can be divided into five clusters based on their cooperative relationships: Cluster 1 is led by the Chinese Academy of Sciences. Members of this cluster have a certain number of papers published and are closely connected ( Figure A1). Cluster 2 is led by the United States Environmental Protection Agency (US EPA), and the number of papers published by its members is slightly less than in Cluster 1. The US EPA and the University of Michigan cooperated frequently with Cluster 1 and Cluster 2. Both Cluster 3 and Cluster 4 are led by universities in the United States: Cluster 3 is led by Harvard University, and its members are closely connected; Harvard University cooperates most with Cluster 1, while the University of North Carolina cooperates most with Cluster 4; Cluster 4 is led by the University of Washington, which maintains a relationship with the other four clusters and plays an important role in connecting the other clusters (examples include the University of California, Los Angeles; the University of Toronto; and the University of British Columbia). Cluster 5 mainly includes research institutions in Europe and developing countries; among these, the Indian Institutes of Technology and the University of Utrecht (The Netherlands) published the most papers, although not as many as in other clusters. The Indian Institutes of Technology acts as a cooperative bridge between institutions of Cluster 5 and other clusters.
The most frequently cited references in the selected research papers involving air pollutant prediction can also be divided into five clusters. Cluster 1 groups is an important basis for geospatial estimation method of air pollution. Two papers conducted detailed evaluations and comparisons of land use regression (LUR) [18,19], while two other studies applied Geographic Information Systems (GISs) to this field [20,21]. Cluster 2 consists of studies on the relationship between air pollution and human health. In this cluster, studies by Pope focused on the effects of PM on cardiopulmonary function [22,23], while Dockery explored the relationship between air pollution and mortality [24]. Studies in Cluster 3 investigated the use of the satellite aerosol optical thickness to predict PM pollutants [25]. The studies in Cluster 4 involve the application of ANNs to the prediction of PM pollutants [26]. Lastly, Cluster 5 consists of studies that apply deterministic models for air pollution prediction ( Figure A2) [27].

Keyword Analysis
Keywords are a summary of the theme of an article. Through the co-occurrence analysis of keywords, one can identify the key points in a research field and the relationship between the points. In this study, the keywords of the air pollution prediction papers were automatically divided into three clusters based on the frequency of co-occurrence. From 1990 to 2018, the analyzed studies were mainly based on three pollutants: PM, ozone, and nitrogen oxide ( Figure 2). The co-occurrence of PM and ozone is close, and both belong to Cluster 1. Additionally, air quality is also an important research object in Cluster 1; this topic involves a wide range of prediction methods, including artificial neural networks (i.e., machine learning), regression models, and WRF-Chem (a chemical transport model). The main pollutants discussed in Cluster 2 are nitrogen oxides, which are studied in exposure assessment and pollution diffusion research. The main prediction methods are land use regression, geographic information systems, Kriging, etc. Additionally, this cluster focuses on topics regarding spaces such as cities, indoor spaces, and traffic lines. Cluster 3 contains studies on the impacts of air pollution, such as climate change and health issues. In terms of health, studies in this direction are more focused on lung function, pregnancy, and fetal health.
applied Geographic Information Systems (GISs) to this field [20,21]. Cluster 2 consists of studies on the relationship between air pollution and human health. In this cluster, studies by Pope focused on the effects of PM on cardiopulmonary function [22,23], while Dockery explored the relationship between air pollution and mortality [24]. Studies in Cluster 3 investigated the use of the satellite aerosol optical thickness to predict PM pollutants [25]. The studies in Cluster 4 involve the application of ANNs to the prediction of PM pollutants [26]. Lastly, Cluster 5 consists of studies that apply deterministic models for air pollution prediction ( Figure A2) [27].

Keyword Analysis
Keywords are a summary of the theme of an article. Through the co-occurrence analysis of keywords, one can identify the key points in a research field and the relationship between the points. In this study, the keywords of the air pollution prediction papers were automatically divided into three clusters based on the frequency of co-occurrence. From 1990 to 2018, the analyzed studies were mainly based on three pollutants: PM, ozone, and nitrogen oxide ( Figure 2). The co-occurrence of PM and ozone is close, and both belong to Cluster 1. Additionally, air quality is also an important research object in Cluster 1; this topic involves a wide range of prediction methods, including artificial neural networks (i.e., machine learning), regression models, and WRF-Chem (a chemical transport model).
The main pollutants discussed in Cluster 2 are nitrogen oxides, which are studied in exposure assessment and pollution diffusion research. The main prediction methods are land use regression, geographic information systems, Kriging, etc. Additionally, this cluster focuses on topics regarding spaces such as cities, indoor spaces, and traffic lines. Cluster 3 contains studies on the impacts of air pollution, such as climate change and health issues. In terms of health, studies in this direction are more focused on lung function, pregnancy, and fetal health.

Evolutionary Tree Analysis
In this step, we firstly examine the papers that studied PM, NOx, O 3 , AQI, and composite pollutants. The numbers of such papers published in the three periods are as follows : 1990-1999: 37; 2000-2009: 195; and 2010-2018: 495. As shown in Figure 3, in the past 30 years, the methods applied for the statistical prediction of air pollutants have changed considerably (Figure 3).

Evolutionary Tree Analysis
In this step, we firstly examine the papers that studied PM, NOx, O3, AQI, and composite pollutants. The numbers of such papers published in the three periods are as follows: 1990-1999: 37; 2000-2009: 195; and 2010-2018: 495. As shown in Figure 3, in the past 30 years, the methods applied for the statistical prediction of air pollutants have changed considerably (Figure 3). . Evolutionary tree of air pollution prediction research from 1990 to 2018 (note: "a" represents artificial neural network (ANN), "b" represents multiple linear regression (MLR), "c" represents the Lagrangian, "d" represents the Gaussian, "e" represents correlation, "f" represents multiple (using multiple methods for comparison), "g" represents land use regression (LUR), "h" represents Bayesian, "i" represents the autoregressive comprehensive moving average model (ARIMA), "j" represents the generalized additive model (GAM), "k" represents support vector machine (SVM), and "l" represents hybrid (combining multiple methods for prediction).
In 1990-1999, the analyzed articles were more focused on multiple pollutants (referring to the simultaneous prediction of multiple pollutants in one paper), which consisted of 3-5 pollutants (around 43%), while only a few studied single pollutants (Figure 3a). In this period, the Gaussian method and ANN were the methods that were most commonly used for the simultaneous prediction of multiple pollutants, while multiple linear regression (MLR) and correlation analysis were used for the prediction of O3.
During 2000-2009, with the development of research methods, the analyzed studies became mostly focused on single pollutants, and research on NOx (57 papers) and PM (55 papers) grew rapidly, exceeding and equaling, respectively, the number of papers on O3 (55 papers) (Figure 3b). Meanwhile, the proportion of papers considering composite pollutants declined by 12.8% (25 papers) compared to 1990-1999. In this period, the papers considering composite pollutants mainly used ANN and LUF for prediction; these two methods were also used for studies of PM and NOx, while studies of O3 only applied ANN. Additionally, the MLR method gradually began to be widely applied in this period.
In 2010-2018, PM pollutants became the mainstream of research interest (Figure 3c). Between 1990 and 2018, the number of studies on PM pollutants grew steadily (231 papers), while the number of studies on O3 increased more slowly (61 papers). In 2010-2018, more studies on multiple pollutants were published compared to 2000-2009. This indicates that research on air pollution prediction has recently begun to focus more on the correlation between pollutants. During this period, hybrid methods for air pollution prediction rapidly gained popularity, ranking third in the methods used in studies of PM, O3, and NOx; this was also the most widely used prediction method in studies using the AQI. Additionally, the generalized additive model (GAM) and SVM also attracted increasing attention.

Markov Chain Analysis
The Markov chain method quantitatively displays the development status of prediction methods for research on different air pollutants ( Figure 4). For PM prediction, the hybrid method is the fastest-developing method, growing from Level 1 to Level 5. Addi- Figure 3. Evolutionary tree of air pollution prediction research from 1990 to 2018 (note: "a" represents artificial neural network (ANN), "b" represents multiple linear regression (MLR), "c" represents the Lagrangian, "d" represents the Gaussian, "e" represents correlation, "f" represents multiple (using multiple methods for comparison), "g" represents land use regression (LUR), "h" represents Bayesian, "i" represents the autoregressive comprehensive moving average model (ARIMA), "j" represents the generalized additive model (GAM), "k" represents support vector machine (SVM), and "l" represents hybrid (combining multiple methods for prediction).
In 1990-1999, the analyzed articles were more focused on multiple pollutants (referring to the simultaneous prediction of multiple pollutants in one paper), which consisted of 3-5 pollutants (around 43%), while only a few studied single pollutants (Figure 3a). In this period, the Gaussian method and ANN were the methods that were most commonly used for the simultaneous prediction of multiple pollutants, while multiple linear regression (MLR) and correlation analysis were used for the prediction of O 3 .
During 2000-2009, with the development of research methods, the analyzed studies became mostly focused on single pollutants, and research on NOx (57 papers) and PM (55 papers) grew rapidly, exceeding and equaling, respectively, the number of papers on O 3 (55 papers) (Figure 3b). Meanwhile, the proportion of papers considering composite pollutants declined by 12.8% (25 papers) compared to 1990-1999. In this period, the papers considering composite pollutants mainly used ANN and LUF for prediction; these two methods were also used for studies of PM and NOx, while studies of O 3 only applied ANN. Additionally, the MLR method gradually began to be widely applied in this period.
In 2010-2018, PM pollutants became the mainstream of research interest (Figure 3c). Between 1990 and 2018, the number of studies on PM pollutants grew steadily (231 papers), while the number of studies on O 3 increased more slowly (61 papers). In 2010-2018, more studies on multiple pollutants were published compared to 2000-2009. This indicates that research on air pollution prediction has recently begun to focus more on the correlation between pollutants. During this period, hybrid methods for air pollution prediction rapidly gained popularity, ranking third in the methods used in studies of PM, O 3 , and NOx; this was also the most widely used prediction method in studies using the AQI. Additionally, the generalized additive model (GAM) and SVM also attracted increasing attention.

Markov Chain Analysis
The Markov chain method quantitatively displays the development status of prediction methods for research on different air pollutants (Figure 4). For PM prediction, the hybrid method is the fastest-developing method, growing from Level 1 to Level 5. Additionally, another four methods developed rapidly; the random forest (RF) and SVM methods grew from Level 1 to Level 3, and the multiple and LUR methods increased from Level 2 to Level 4 and Level 5, respectively. The LUR method had the second-fastest growth rate, increasing from Level 2 (four papers) to Level 5 (39 papers). During 2000-2009, the ANN method was at Level 4 (17 papers), and it increased to Level 5 (40 papers) in 2000-2018. The extreme value theory (EVT) is the only method to show a decreasing trend; for O 3 prediction, the hybrid method increased from Level 1 to Level 3. However, the autoregressive integrated moving average (ARIMA), Lagrangian, and adaptive neuro-fuzzy inference system (ANFIS) methods dropped from Level 2 to Level 1. 2 to Level 4 and Level 5, respectively. The LUR method had the second-fastest growth rate, increasing from Level 2 (four papers) to Level 5 (39 papers). During 2000-2009, the ANN method was at Level 4 (17 papers), and it increased to Level 5 (40 papers) in 2000-2018. The extreme value theory (EVT) is the only method to show a decreasing trend; for O3 prediction, the hybrid method increased from Level 1 to Level 3. However, the autoregressive integrated moving average (ARIMA), Lagrangian, and adaptive neuro-fuzzy inference system (ANFIS) methods dropped from Level 2 to Level 1. The hybrid method and multiple method for NOx prediction both jumped from Level 1 to Level 3. However, ANN, MLR, ARIMA, and four other methods decreased to varying degrees, some of them falling from Level 2 to Level 1 and others falling from Level 3 to Level 2. The growth trend of prediction methods for multiple pollutants was similar to that of studies on NOx. The main difference is that the LUR method increased from Level 2 to Level 4 from NOx studies. For pollutant prediction based on the AQI, the hybrid method is the only one whose level increased. The hybrid method is especially preferred in studies of PM. The number of published papers increased from 1 to 36. The application of the hybrid method also grew for research on O3 and NOx, especially when combined with multiple pollutants and the AQI. For studies of O3 and NOx, the number of papers using the hybrid method increased from an average of 1 to an average of 5 and 5.5, respectively, while for studies of multiple pollutants and the AQI, the number of papers using the hybrid method increased from an average of 1 to an average of 5.5 and 8, respectively ( Figure 4). The hybrid method and multiple method for NOx prediction both jumped from Level 1 to Level 3. However, ANN, MLR, ARIMA, and four other methods decreased to varying degrees, some of them falling from Level 2 to Level 1 and others falling from Level 3 to Level 2. The growth trend of prediction methods for multiple pollutants was similar to that of studies on NOx. The main difference is that the LUR method increased from Level 2 to Level 4 from NOx studies. For pollutant prediction based on the AQI, the hybrid method is the only one whose level increased. The hybrid method is especially preferred in studies of PM. The number of published papers increased from 1 to 36. The application of the hybrid method also grew for research on O 3 and NOx, especially when combined with multiple pollutants and the AQI. For studies of O 3 and NOx, the number of papers using the hybrid method increased from an average of 1 to an average of 5 and 5.5, respectively, while for studies of multiple pollutants and the AQI, the number of papers using the hybrid method increased from an average of 1 to an average of 5.5 and 8, respectively (Figure 4).

Discussion
This study applies bibliometrics to quantitatively analyze the literature on air pollution prediction (publications, research institutions, keywords, etc.) and integrates the statistical spatiotemporal analysis with the evolutionary tree method. By reviewing the spatiotemporal pattern in the literature, we visualize the inter-connection of multi-dimensional data and the mutations of the evolutionary pathway. The results indicate the hotspots and trends of air pollution prediction research. Regarding air pollution prediction, PM, O 3 , and NOx are the most widely studied single pollutants. Additionally, multiple pollutants and the AQI are also popular research targets. In the following, we will further discuss the research hotspots and the difficulties of research on pollution prediction for different pollutants.

Particulate Matter (PM)
PM2.5 and PM10 are the pollutants of greatest concern. In recent years, the number of studies on PM1 or finer PM has increased [28,29]. For these pollutants, when conducting statistical prediction, ANNs are the most widely applied method. The ANN interpolation method has significant advantages in most cases, especially under the condition of limited air quality network density [30]. However, for solving the problem of air pollution source allocation, the BP-ANN method requires as many emission sources as possible to complete training [31].
In 2000, when applying a multi-layer neural network method for predicting the hourly PM2.5 concentration at a fixed point, the prediction error was around 30-60% [32]. However, by 2018, with the development of a variety of neural network algorithms, the prediction error had been greatly reduced and was far lower than that of the general statistical prediction method [33]. For example, the Self-Organizing Deep Belief Network (SODBN), which is based on a growth and pruning algorithm, can dynamically adjust the weights in the process of structure self-organization, effectively shortening the running time and improving the accuracy [34]. The extended model of a long-term short-term memory neural network considering temporal and spatial correlation automatically extracts the inherent useful features of historical air pollutant data through a long-term and short-term memory (LSTM) layer and combines auxiliary data (including meteorological data and time-stamp data) into the proposed model to enhance the performance. For the simulation of the hourly PM2.5 concentration, this method is superior to other statistical models, such as the spatiotemporal deep learning (STDL) model, the delayed neural network (TDNN) model, the autoregressive moving average (ARMA) model, and the support vector regression (SVR) model [35].
Land use regression models can obtain the distribution of air pollutant concentrations at a fine scale using a small number of monitoring stations and prediction factors without air pollution source data. It is a promising technique for the high-spatial-resolution prediction of ambient air pollutant concentration. Due to the strong heterogeneity of the urban land surface, the LUR model cannot fully characterize the spatial characteristics of urban pollutant emissions. Previous studies have shown that this method can only explain 61% to 64% of the spatial distribution trend of PM2.5 concentration [36]. The choice of the boundary buffer distance is key in LUR models since this determines the predicted spatial distribution of PM2.5 [37]. With the progression of the study of the factors influencing changes in surface PM concentration based on the result layer obtained by the LUR method, an increasing number of relevant factors and their interactions have been used to predict changes in PM concentration. For example, Liu et al. [38] used meteorological factor regression (MFR) and backpropagation neural network (BPNN) modeling techniques combined with LUR to simulate the temporal variation of PM10. Li et al. [37] studied the combination effect of AOD and LUR and found that the introduction of LUR can improve the performance of the urban-rural transition area (land use characteristics change rapidly), while AOD can improve the model performance in spring.
Multiple linear regression (MLR) is a mature statistical forecasting method that can quantitatively describe the linear relationship between the target variable and multiple independent variables in order to predict variables. Since many factors can affect the particle concentration, the MLR model is suitable for the prediction of particle concentration. In the process of building a multivariate linear model, the selection of influencing factors is crucial. The results of a previous study show that humidity, temperature, wind speed, wind direction, carbon monoxide, and ozone are the main factors causing PM10 changes in the Malaysian Peninsula [39]. The least squares method is the most commonly used method to estimate parameters in MLR models. Its performance is slightly better than that of the maximum likelihood method and the squares estimation method [40]. The test of MLR mainly includes a test of goodness of fit, a significance test of variables, and the model as a whole [41]. The MLR equation has a high prediction accuracy for PM2.5 concentrations for high-emission areas; for example, Hou et al. [42] found that the correlation coefficient between the predicted and measured PM2.5 concentration passed the 95% confidence test.
Furthermore, recently, other methods for air pollution prediction have emerged, such as Bayesian maximum entropy (BME) and SVM. Compared with the vector autoregressive moving average (VARMA), ARIMA, and multi-layer perceptron (MLP) neural network models, the SVM model performs better for predicting air pollutant concentrations in the following month [43]. Unlike SVM models, Bayesian models are seldom compared with other methods. Research shows that the BME method can effectively improve the spatiotemporal estimation accuracy of PM2.5 by combining PM10 and total suspended particulate (TSP) data. Yang and Christakos [44] used the BME method to evaluate the spatiotemporal variability of PM2.5 concentration in western Shandong Province and found that this region has experienced long-term heavy PM2.5 pollution and experiences much more serious pollution in winter than in other seasons.
The application of hybrid methods for the prediction of air pollution has developed rapidly since 2010. The ANN method is the most widely coupled with other methods. For example, it is combined with the EEMD method to form the EEMD-GRNN model, which is based on data preprocessing and data analysis integrated with an empirical pattern [45]; the EEMD part is used to decompose the original PM2.5 data into several intrinsic mode functions (IMFs), and the GRNN part is used to predict each IMF. This combination produces better results compared to the GRNN model, MLR model, PCR model, and traditional ARIMA model. Combining a traditional ANN model with the principal component analysis, radial basis function network, and K-means clustering methods can improve the training speed and prediction performance [15,46]. The combination of a linear regression model and an ANN model can better predict the daily average concentration of PM10 on the following day compared to the MLR model alone [47]; combined with air quality trajectory analysis, the original time series can be decomposed into several sub-series with less variability by using wavelet transform, and then the ANN can be used to predict the PM2.5 concentration. Using this hybrid approach can reduce the root-mean-square error (RMSE) by 40% and improve the prediction of the high-value part [48]. In periods of high PM concentration, the optimization performance of the combination of numerical prediction and statistical methods is more prominent [49].
As well as the ANN method, LUR is also applied in hybrid methods for air pollution prediction. The hybrid Kriging/LUR model used by Wu et al. [50] achieved a better prediction of PM2.5 concentration compared to the LUR method and Kriging method, respectively; its annual model normalized cross-validation R 2 value was 0.85, which is better than those of LUR (0.66) and Kriging (0.82). Moreover, a hybrid Grey Markov/LUR model achieved a significantly higher accuracy of PM10 concentration prediction for urban environments compared to the LUR model; the average relative error was 5.13% (vs. 24.09% for LUR) and the RMSE was 5.50 µg/m (vs. 21.31 µg/m for LUR) [51]. Overall, hybrid approaches have become a research hotspot in air pollution prediction, and their use is growing rapidly.

Ozone (O3)
Deterministic models are the main models that are applied for the formulation of air quality control policies (the US EPA requires an air quality simulation to have an accuracy within 15%). Such models are widely used in the study of O 3 prediction. Additionally, in recent years, many statistical predictions have been applied to the prediction of O 3 . Statistical prediction methods are commonly combined with ANN, MLR, and multi-method coupling models to predict ozone concentration. In the 1990s, the MLR and Lagrangian particle models were two representative methods that were widely used to predict ozone concentration. The Lagrangian particle model has been used to simulate the motion and diffusion trajectory of surface ozone [52,53], while MLR has been applied to develop ozone prediction models. The results of the aforementioned studies show that there is a positive correlation between temperature and ozone concentration when the temperature is lower than 27 • C and a negative correlation when the temperature is higher than 27 • C [54].
In the 21st century, ANNs have gradually become the mainstream method for ozone prediction, while the use of the Lagrangian particle model has gradually decreased. Multiple linear regression is often used for comparison with the prediction effect of the ANN method [55]. Some studies found that the complexity of the ANN structure did not guarantee better ozone prediction results. The prediction of ozone using a simplified ANN model (using only six regular meteorological parameters and time data as a covariate input) is also acceptable. Maximum temperature, atmospheric pressure, sunshine hours, and maximum wind speed have been shown to be the main input variables affecting the prediction of ozone concentration in ANN models [56]. The combination of an ANN and an adaptive radial basis function (ARBF) network has also been applied to predict the daily maximum ozone concentration [57]. Additionally, support vector machine regression (SVMR) and an RF model have been used to simulate urban ozone concentrations and assess human exposure [58,59]. The training ability of SVMR can be improved by verifying the significance of adding different variables.
Furthermore, a hybrid model for ozone prediction was proposed by Di et al. [60], which performs well even for low ozone concentrations. In this model, neural networks are used to simulate interaction and nonlinearity, and the convolution kernel is used to gather nearby information to explain the spatial and temporal autocorrelation. The results showed that the cross-validation R 2 of monitoring points was between 0.74 and 0.80 (average 0.76) and the performance of the model was good. Moreover, Durao et al. [61] developed a twostep method to predict tropospheric ozone concentrations. This method first determines the best prediction factors by classification and regression and then uses a multi-layer perceptron model to predict the O 3 concentration at each monitoring location [62].
However, the concentration of ozone varies greatly in time due to rapid ozone exchange between the upper layer and surface, which causes difficulties in using ground stations combined with other variables to predict ozone concentrations using statistical models. To solve this problem, Tan et al. [57] conducted statistical analysis and found that many air pollutants (CO 2 , CH 4 , NO 2 , etc.) are related to the vertical distribution of ozone. Based on this finding, an MLR model of columnar ozone for Peninsular Malaysia was developed. The correlation coefficient between the vertical distribution of columnar ozone predicted using the model and the vertical distribution of ozone measured by SCIA-MACHY (Satellite Scanning Imaging Atmospheric Chart Absorption Spectrometer) was 0.75-0.80.

Nitrogen Oxides (NOx)
In the last decade, LUR has been the most popular method for NOx prediction. In 2001, a simple statistical prediction model was established, which allowed the prediction of NO 2 concentrations several hours in advance [32]. Subsequently, LUR models have gradually gained popularity for estimating the high-resolution spatial distribution of NOx. Novotny used this method to estimate the NO 2 concentration, and the results showed that the method could capture the distribution of NO 2 in cities and areas close to roads [63]. Additionally, Huang et al. [64] found that the corrected LUR model can explain 87% of the NO 2 concentration in Nanjing in 2013.
When using LUR models, at different research scales, different related environmental variables should be considered to improve their prediction ability. At the regional scale or larger scales, as well as road and railway density, traffic volume, and land use, meteorological factors can improve the prediction ability, while short-term meteorological changes are generally not considered [65]. For example, Mavko et al. [66] found that, after incorporating wind direction into the regression model, the prediction capability for the spatial variation of NO 2 concentration increased from 66% to 81%. Furthermore, studies have shown that the wind field information of global models (GEM-HiMAP) has a significant impact on the overall model prediction after it is incorporated into the LUR framework [67]. Furthermore, Li et al. [68] showed that it is more convenient to introduce wind direction in an LUR model based on a semi-circular buffer. At the block-or street-scale, more detailed urban morphology and traffic information should be included in the model. For example, Weissert et al. [69] developed a micro-LUR model for Auckland, New Zealand. The predictive variables identified at this scale included street width, distance to main roads, the presence of awnings, and the number of bus stops.
Coupling land use models with other methods is an increasing trend in the field of NOx prediction. Adding Kriging or satellite data to specific NO 2 LUR models can improve the prediction result, especially when predicting points far away from a monitoring position (clustering cross-validation) [70]. Araki et al. [71] established a spatiotemporal land use random forest (LURF) model for the metropolitan area of Japan, using RF to express the nonlinear relationship between NO 2 concentration and predictive variables. Crossvalidation testing showed that the R 2 value of the LURF model is 0.79, which is better than that of the traditional linear regression land use model (R 2 = 0.73).

Multiple Pollutants and AQI
The focus of mixed pollutants is not on the accuracy of prediction methods but is mainly on obtaining more information on pollutant concentrations and distribution. With the development of science and technology, researchers are no longer satisfied to predict only single pollutants but also attempt to predict multiple pollutants. The research object of multiple pollutants is to holistically study common and wide-ranging pollutants such as PM2.5, NO 2 , BC, O 3 , and SO 2 to comprehensively reflect the overall level of air pollution and provide a basis for environmental management. Therefore, the application of the AQI is urgently needed for the overall evaluation of air quality. Common air quality prediction models usually predict multiple pollutants first and then calculate the air quality index [72,73]. The prediction effect of multiple pollutants is an important premise of the AQI prediction model.
The study of multiple pollutants is often complicated by the difference in the prediction effect of different pollutants. For example, the prediction results of a composite pollution LUR model developed by Huang et al. [64] can explain 87%, 83%, 72%, and 65% of NO 2 , SO 2 , PM2.5, and O 3 , respectively. The correlation between the types of pollutants, seasons, and meteorological variables at different altitudes is also different. The sensitivity of pollutants to meteorological variables is higher in winter than in other seasons, and the sensitivity of ozone to meteorological variables is also different from the other two pollutants [74]. Therefore, improving the overall prediction accuracy of composite pollution models is very difficult.

Air Pollutants and Their Health Impacts
Combining prediction models with environmental risk and exposure assessment is an important direction of air pollution prediction. By studying the relationship between pollutants and human health/the environment, the potential impacts of future air pollution can be evaluated. Meanwhile, using a stable and timely warning platform to predict the probability of pollution events or extreme pollution situations can provide an important basis for decision-makers [75].
Based on data of PM2.5 and the numbers of emergency and inpatient hospital visits for respiratory cardiovascular disease in New York City in 2004-2006, Weber et al. [16] found that high PM2.5 exposure was associated with an increased risk of asthma, cardinal infarction, and heart failure. Long-term exposure to air pollutants has adverse effects on the lung function of LTRs, although macrolides may ameliorate this effect [76]. Air pollution also has adverse effects on fetal development. During the critical period of 2-8 weeks of pregnancy, exposure to PM with a diameter of less than 2.5 microns (PM2.5) seriously affects the development of the heart of the fetus [77]. By studying the relationship between the behavior rating and exposure to black carbon (BC) and fine particulate matter (PM2.5) in children aged 0-6 years, Harris et al. [78] concluded that air pollution may affect the development and function of the brain, thus affecting children's problems.
To better evaluate the impact of air pollution on different activity spaces, scholars focused on two spatial scales: (1) special places (industrial areas, mining areas, urban traffic light intersections, outdoor bus stations, road construction sites, etc.), and (2) urban-scale air pollution prediction. The prediction of air pollution in special places has a more direct impact on people's production and life. For example, polycyclic aromatic hydrocarbons (PAHs)-which are emitted during combustion and have carcinogenic and genotoxic effects-are the focus of the prediction of indoor air pollution [79]. Additionally, the assessment of the impact of temple incense on increased exposure to PMS and PAHs can provide guidance for the healthy lifestyles of residents [80]. Moreover, in the vicinity of industrial and traffic pollution sources, the concentration prediction and impact assessment of heavy metals and PM can provide a scientific basis for urban planning and design for urban managers. For example, it is possible to predict atmospheric heavy metal concentrations based on the biomagnetism of tree leaves [81].

Conclusions
Based on the number of papers, subjects, countries, institutions, journals, and keywords, this paper used bibliometrics and geographic evolutionary tree analysis to determine the research trends of statistical methods for air pollution prediction between 1990 and 2018. The results showed that, during the study period, an increasing number of subjects have been involved in the prediction of air pollutants, such as "Environmental Sciences & Ecology", "Meteorology & Atmospheric Sciences", "Engineering", and "Public, Environmental, & Occupational Health". Since 2000, researchers have started to pay more attention to the influence of PM on air pollution. The results of this analysis suggest that studies that investigate the effect of the incidence rate of air pollution on the incidence of human diseases, city pollution exposure models, and land use regression methods are cited more frequently.
Among the research papers analyzed in this study, PM is the most widely studied pollutant, followed by NOx and O 3 . The most popular statistical prediction methods are artificial neural networks (ANNs), land use regression (LUR), multiple linear statistical analysis, and multi-method coupling models. Air pollution prediction involves quantifying the concentration and spatial distribution of air pollutants that may appear in the future, and its ultimate purpose is to prevent the occurrence of pollution-related hazards. Therefore, pollution prediction cannot be superficial. Studies of air pollution assessment and early warning are particularly important. Although, after years of effort, the accuracy of air pollution prediction has been increased, more work remains to be done. In the future, it is necessary to further study the interaction mechanisms between air pollutants, human health, and the urban environment.
By using bibliometrics and the evolutionary tree method, this paper determined the evolution of statistical research methods for major air pollutants between 1990 and 2018. The results show that the ANN method became an increasingly popular means to predict air pollution, and the interaction between different pollutants attracted increasing research attention, with PM + NOx and PM + O 3 being the main combinations. Furthermore, the results indicate that, since 2010, hybrid methods have rapidly joined the mainstream of air pollution prediction and have been the fastest-developing research method through Markov chain analysis. It is predicted that the future statistical prediction of air pollution will be based on hybrid methods to predict composite pollutants and that the interaction between pollutants will be a key focus. The methods used in this study may be instructive to other studies and could provide efficient and economical methods to understand the past and future state of research on air pollution prediction. Acknowledgments: We want to thank the editor and anonymous reviewers for their valuable comments and suggestions to this paper.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. The number of publications in the field of air pollutant prediction in the 10 disciplines that were most active in publishing studies in this field during three periods.   Figure A1. A cooperation map of research institutions producing studies involving air pollutant prediction from 1990 to 2018 (note: the node size in the figure represents the number of documents issued by the institution, and the connection between nodes represents the cooperation between institutions; the more connections, the closer the partnership). Figure A1. A cooperation map of research institutions producing studies involving air pollutant prediction from 1990 to 2018 (note: the node size in the figure represents the number of documents issued by the institution, and the connection between nodes represents the cooperation between institutions; the more connections, the closer the partnership).
ere 2021, 12, x FOR PEER REVIEW 16 of 20 Figure A2. A co-citation map of the selected studies involving air pollutant prediction from 1990 to 2018 (note: each node represents a document, the node size represents the number of citations, and the connection between nodes represents the co-citation relationship). The names are all air quality indexes but the indexes defined in different articles may be different. le Multiple pollutants Multiple pollutants episodes, including at least one of PM, NOx, or O3. These three kinds of simultaneous multiple air pollutant forecasting cases are not included in this study. Artificial neural network Artificial neural network Figure A2. A co-citation map of the selected studies involving air pollutant prediction from 1990 to 2018 (note: each node represents a document, the node size represents the number of citations, and the connection between nodes represents the co-citation relationship). Gaussian process model Gaussian process models and Gaussian-related models Multiple The abovementioned methods are used simultaneously, each method is used (not mixed), and the discussion is not biased to a particular method. Hybrid Mixtures of the above methods