Visualization and Analysis of Air Pollution and Human Health Based on Cluster Analysis: A Bibliometric Review from 2001 to 2021

Bibliometric techniques and social network analysis are employed in this study to evaluate 14,955 papers on air pollution and health that were published from 2001 to 2021. To track the research hotspots, the principle of machine learning is applied in this study to divide 10,212 records of keywords into 96 clusters through OmniViz software. Our findings highlight strong research interests and the practical need to control air pollution to improve human health, as evidenced by an annual growth rate of over 15.8% in the related publications. The cluster analysis showed that clusters C22 (exposure, model, mortality) and C8 (health, environment, risk) are the most popular topics in this field of research. Furthermore, we develop co-occurrence networks based on the cluster analysis results in which a more specific keyword classification was obtained. These key areas include: “Air pollutant source”, “Exposure-Response relationship”, “Public & Occupational Health”, and so on. Future research hotspots are analyzed through characteristics of the cluster groups, including the advancement of health risk assessment techniques, an interdisciplinary approach to quantifying human exposure to air pollution, and strategies in health risk assessment.


Introduction
The negative effects of air pollution on human health are well documented [1,2]. Harmful air pollutants, such as PM 2.5 , PM 10 , SO 2, and NO X , escaping into the environment through natural and human activities may adversely affect human health [3]. Air pollution has acute and chronic effects on various systems and organs. These range from upper respiratory tract irritation to chronic respiratory and heart diseases [4], lung cancer [5], childhood acute respiratory infections [6] and adult chronic bronchitis, exacerbating existing cardiopulmonary diseases or asthma attacks [7], and even causing serious mental illness [8]. According to the official statistics of the World Health Organization, the number of people killed by air pollution is as high as 7 million every year, and 9 out of every 10 people in the world still breathe air containing high levels of pollutant concentration [9].
Due to the growing public concern about the adverse health consequences caused by air pollution, an increasing number of publications have examined the correlation between exposure to air pollutants and the incidence and mortality of various diseases [10][11][12][13][14][15][16]. For example, some studies have shown that exposure to particulate matter and ozone in the air is associated with increased mortality and hospitalization due to respiratory and cardiovascular disease [17,18]. Particulate matter is likely to penetrate the lungs and

Data Collection
The Web of Science Core Collection provides a variety of records for each publication, including author information, journals, citations, and institutional affiliations. We mainly search for articles from SCI and SSCI, published in English from 2001 to 2021. The study used keywords (i.e., "air pollution*" or PM 2.5 or ozone or "particular matter") and (health or mortality or fatality or death or epidemiology or fitness or morbidity) to search and collect research articles. A total of 46,934 pieces of literature were identified through database searches. After excluding 11,963 pieces of literature of the non-article type and an additional 2566 via screening of the titles and abstracts, finally, 14,955 articles were full-text screened for eligibility (See Figure 1, Figures S1 and Table S1).
In addition, OmniViz (BioWisdom Ltd., Cambridge, UK) was used to extract and cluster keywords from the articles. OmniViz is an advanced visual informatics software package that is designed to provide visualization of digital data, categorical data, genomic sequences, chemical structures, and text documents [27]. It can analyze large data sources through different clustering methods. We used OmniViz to identify important topics and hot research areas. In addition, the clustering method was used to measure the similarity of two records in a high-dimensional space. To achieve data visualization, we used Galaxy and Thememap. Galaxy provides relationships between lots of records, and Thememap identifies the most important topics in the field. Please see Figure S2 in the supplementary file for details on methodology and software. hot research areas. In addition, the clustering method was used to measure the similarity of two records in a high-dimensional space. To achieve data visualization, we used Galaxy and Thememap. Galaxy provides relationships between lots of records, and Thememap identifies the most important topics in the field. Please see Figure S2 in the supplementary file for details on methodology and software.

Impact Factors
The impact factor (IF) and h-index are well-recognized indicators that are closely related to the bibliometric analysis. The IF is a useful indicator to quantify the rank and quality of a journal. The IF of a journal is calculated by dividing the citation count of the current year by the number of published articles in the journal during the previous two years [Error! Reference source not found.]. It is created by the Institute of Scientific Information (ISI). A higher IF usually reflects a journal's higher quality in various research

Impact Factors
The impact factor (IF) and h-index are well-recognized indicators that are closely related to the bibliometric analysis. The IF is a useful indicator to quantify the rank and quality of a journal. The IF of a journal is calculated by dividing the citation count of the current year by the number of published articles in the journal during the previous two years [28]. It is created by the Institute of Scientific Information (ISI). A higher IF usually reflects a journal's higher quality in various research fields. The h-index means that 'h' of one's total articles are cited at least 'h' times. It is a popular indicator to measure the performance of a scientist and has been widely used to evaluate the academic performance of a journal or a country.

Social Network Analysis (SNA)
Social network analysis (SNA) refers to a computable analysis method based on multidisciplinary fusion theories and methods to understand the formation of various human social relationships, behavioral characteristics analysis, and the laws of information transmission. It aims to quantify the network's structure features and the dynamic interactions among network vertices. Due to the development of social network theory, SNA has been widely used to analyze academic collaboration in different fields [29]. By using a variety of measurement metrics, the contributions from different countries, institutions, and scientists can be evaluated.
In this study, SNA is used to evaluate collaboration among different countries and institutions [30], which includes two steps. The first step is information extraction. The country and institutional information for each author was extracted using BibExcel so that the visualization effect of academic cooperation among different countries can be presented. The second step is to draw a cooperation diagram with the input data from BibExcel using Pajek to visualize their cooperation patterns.

The Performance of Related Publications
In recent years, the issues of air pollution have gradually drawn public attention. Our study focuses on the effect of air pollution on health from the perspective of bibliometrics so as to understand the current research status and future research trends. Figure 2 shows the annual number (NO) of articles published between 2001 and 2021, the total number of citations (TC) for the articles, and the average citation count (ACPP) for each article. It can be observed that the NO has grown slowly in the first 12 years and has increased rapidly with a growth rate of more than 9% since 2008. The number of articles published after 2009 accounted for nearly 85% of the total number of published articles. In addition, the TC grew steadily during the first 11 years, peaked in 2008 and 2018, and then gradually declined. Due to the increasing number of published articles, the ACPP showed a downward trend as a whole. fields. The h-index means that 'h' of one's total articles are cited at least 'h' times. It is a popular indicator to measure the performance of a scientist and has been widely used to evaluate the academic performance of a journal or a country.

Social Network Analysis (SNA)
Social network analysis (SNA) refers to a computable analysis method based on multidisciplinary fusion theories and methods to understand the formation of various human social relationships, behavioral characteristics analysis, and the laws of information transmission. It aims to quantify the network's structure features and the dynamic interactions among network vertices. Due to the development of social network theory, SNA has been widely used to analyze academic collaboration in different fields [Error! Reference source not found.]. By using a variety of measurement metrics, the contributions from different countries, institutions, and scientists can be evaluated.
In this study, SNA is used to evaluate collaboration among different countries and institutions [Error! Reference source not found.], which includes two steps. The first step is information extraction. The country and institutional information for each author was extracted using BibExcel so that the visualization effect of academic cooperation among different countries can be presented. The second step is to draw a cooperation diagram with the input data from BibExcel using Pajek to visualize their cooperation patterns.

The Performance of Related Publications
In recent years, the issues of air pollution have gradually drawn public attention. Our study focuses on the effect of air pollution on health from the perspective of bibliometrics so as to understand the current research status and future research trends. Figure 2 shows the annual number (NO) of articles published between 2001 and 2021, the total number of citations (TC) for the articles, and the average citation count (ACPP) for each article. It can be observed that the NO has grown slowly in the first 12 years and has increased rapidly with a growth rate of more than 9% since 2008. The number of articles published after 2009 accounted for nearly 85% of the total number of published articles. In addition, the TC grew steadily during the first 11 years, peaked in 2008 and 2018, and then gradually declined. Due to the increasing number of published articles, the ACPP showed a downward trend as a whole.

Publication Features of Different Countries
The number of publications reflects the academic strengths and attentions of each country in the field. The top five most productive countries are the United States (6932 publications), China (4350 publications), the United Kingdom (2206 publications), Canada (1035 publications), and Italy (974 publications). These top five countries published a total of over 10,000 articles, accounting for 78.55% of all publications. Among these productive countries, the United States outperforms others in the total number of published articles on air pollution during 2001-2021 [31]. It can be observed that China's publications have grown rapidly since 2014, with an average growth rate of over 30%. Moreover, the amount of literature published by Chinese scholars gradually approached the USA's productivity in 2021.
We applied SNA to analyze the international collaboration among the 20 most productive countries during the period 2001-2021 ( Figure 3). The lines connecting the countries represent their cooperation, and the line thickness indicates the degree of collaboration [32,33]. Collaboration was determined by the affiliations of the co-authors, and all countries or institutes stand to benefit if one publication is a collaborative study [34,35]. These 20 productive countries worked closely with each other, particularly the U.S.A., China, Canada, Germany, and the U.

Publication Features of Different Countries
The number of publications reflects the academic strengths and attentions of each country in the field. The top five most productive countries are the United States (6932 publications), China (4350 publications), the United Kingdom (2206 publications), Canada (1035 publications), and Italy (974 publications). These top five countries published a total of over 10,000 articles, accounting for 78.55% of all publications. Among these productive countries, the United States outperforms others in the total number of published articles on air pollution during 2001-2021 [Error! Reference source not found.]. It can be observed that China's publications have grown rapidly since 2014, with an average growth rate of over 30%. Moreover, the amount of literature published by Chinese scholars gradually approached the USA's productivity in 2021.
We applied SNA to analyze the international collaboration among the 20 most productive countries during the period 2001-2021 ( Figure 3). The lines connecting the countries represent their cooperation, and the line thickness indicates the degree of collaboration [32,33]. Collaboration was determined by the affiliations of the co-authors, and all countries or institutes stand to benefit if one publication is a collaborative study [34,35].

The Performances of Different Journals
The top 20 most productive journals are shown in Table 1. These productive journals account for 63.4% of the total related publications. In particular, Atmospheric Environment is the most productive journal with a count of 1506 (10.07%) articles. Other dominating journals include Environmental Health Perspectives, Science of the Total Environment, and Environmental Research. The IF of the journal is not the only index to reflect the journal's influence in the field. Therefore, we calculated the average citations to better reflect the journal's influence. The results showed that Epidemiology and Environmental Health Perspectives have the highest average citations (86.25 and 81.30), followed by the American Journal of Epidemiology, Occupational and Environmental Medicine, and Environmental Science and Technology.

The Performances of Different Journals
The top 20 most productive journals are shown in Table 1. These productive journals account for 63.4% of the total related publications. In particular, Atmospheric Environment is the most productive journal with a count of 1506 (10.07%) articles. Other dominating journals include Environmental Health Perspectives, Science of the Total Environment, and Environmental Research. The IF of the journal is not the only index to reflect the journal's influence in the field. Therefore, we calculated the average citations to better reflect the journal's influence. The results showed that Epidemiology and Environmental Health Perspectives have the highest average citations (86.25 and 81.30), followed by the American Journal of Epidemiology, Occupational and Environmental Medicine, and Environmental Science and Technology.
Furthermore, the field of air pollution and health is typically an interdisciplinary area. According to the statistics (Figure 4), the largest proportion of research areas are in Environmental Sciences and Ecology; Public, Environmental, and Occupational Health; Meteorology and Atmospheric Sciences; and Toxicology, which account for 93.2% of the total number of publications. Among them, the Environmental Sciences and Ecology research area maintains an average growth rate of more than 15%, occupying the most important position.

Institutions' Performances
The performances of the top 20 most productive institutions are listed in Table 2. Most institutions are from the productive countries shown in the previous section. Among them, twelve institutions are located in the U.S.A and three are from China. Harvard University is the most productive research organization with 1263 publications, followed by the University of California System, the United States Environmental Protection Agency, and the Chinese Academy of Sciences. In the U.S.A., universities and government research institutes, such as the United States Environmental Protection Agency, are the main forces

Institutions' Performances
The performances of the top 20 most productive institutions are listed in Table 2. Most institutions are from the productive countries shown in the previous section. Among them, twelve institutions are located in the U.S.A and three are from China. Harvard University is the most productive research organization with 1263 publications, followed by the University of California System, the United States Environmental Protection Agency, and the Chinese Academy of Sciences. In the U.S.A., universities and government research institutes, such as the United States Environmental Protection Agency, are the main forces in the field. Additionally, several European countries, such as England, the Netherlands, and Switzerland, have mature experience in air pollution prevention and reducing its negative impact on health. Therefore, it is not surprising to see that 4 institutions in European countries are listed among the top 20 most productive institutions. A machine learning-based cluster analysis was carried out on the keywords of 14,955 research articles using software named OmniViz. A total of 10,212 records were obtained and divided into 96 clusters. The article selected clusters with more than 50 records, for a total of 29 clusters, as shown in Table 3. Moreover, we also give top 20 frequent keywords list (Table S2). The cluster C22 (exposure, model, mortality) has the most publications with a total of 3629, which accounts for 37.1% of the total. In addition, C8 (health, environment, risk), C92 (health, environment, risk), C35 (exposure, person, particle), C6 (model, ozone, emit), and C64 (aerosol, source, dust) are recorded more than 300 times. These clusters were also usually selected as research topics. In addition, the distance of each cluster in the galaxy map can reflect their correlations. If their locations are closely related to their research relevance, their relevance is very high. On the contrary, these research themes are not strongly related. The clusters of publications recorded with more than 50 instances are shown in Figure 5 and marked as yellow. It can be concluded that C8, C47, C81, and C25 are closed to each other and concentrated at the top of the galaxy map. This indicates that keywords such as "environment, health, and risk" are often associated with "exposure, chronic, disease, heart," and so on. Similarly, located in the middle and lower parts of the galaxy map, C89, C64, C35, C58, and C56 are also closely linked. This suggests that the relationships between the keywords of "particle, ozone, dust", and other related pollutants and "aerosol, concentration, source" are very close. In addition, clusters C8 and C92 show the same term labels, both of which are "health, environment, risk". The number of records is above 500. However, they are from different collections of publications at different positions on the galaxy map ( Figure 5). Each pollutant has different sources and measurement methods at the pollutant level [36]. The health level involves various diseases, and the related model methods are often employed in related research [37].  In addition, as shown in Figure 6, a theme map can identify the main topics in the research field of air pollution and health, which is an effective complement to the Galaxy visualization. The height of the peak depends on the intensity of the topic and the concentration of information at that location. It can be observed that the four highest peaks are: "health, environment, risk", "exposure, model, mortality", "exposure, ozone, person", and "model, ozone, emit". The results are very similar to the cluster analysis; however, there are some differences in the height order. The main reason is that, although C22 has the most records, it is not closely related to the surrounding clusters. The cluster C8 is more intensive at this position, resulting in a stronger theme, which led to the highest peak. Compared with others, the peaks around "exposure, person, particle" are significantly denser. There are more valleys surrounding the rest of the peaks, which suggests that such research is highly relevant and often involves an interdisciplinary approach. In addition, as shown in Figure 6, a theme map can identify the main topics in the research field of air pollution and health, which is an effective complement to the Galaxy visualization. The height of the peak depends on the intensity of the topic and the concentration of information at that location. It can be observed that the four highest peaks are: "health, environment, risk", "exposure, model, mortality", "exposure, ozone, person", and "model, ozone, emit". The results are very similar to the cluster analysis; however, there are some differences in the height order. The main reason is that, although C22 has the most records, it is not closely related to the surrounding clusters. The cluster C8 is more intensive at this position, resulting in a stronger theme, which led to the highest peak. Compared with others, the peaks around "exposure, person, particle" are significantly denser. There are more valleys surrounding the rest of the peaks, which suggests that such research is highly relevant and often involves an interdisciplinary approach.

Relationship of Keywords among Different Groups
Through keyword clustering, 10,212 records were divided into 96 clusters. In the k words Galaxy map (Figure 7), we can roughly divide them into three groups, i.e., Gr I, Group II, and Group III, based on the positional relationships of different clusters. hough the clusters located at the lower left of the Galaxy map are very dense, mos these clusters appear less than 50 times and, therefore, will not be analyzed further. Wi each group, we develop co-occurrence networks and obtain more specific keyword c sifications based on keyword frequency, relationship, and semantic analysis. The resea hotspots are analyzed through the characteristics of three groups.
(1) Group I The impact of the deterioration of the ecological environment, especially of the quality, on human health has received a growing level of attention. The damage to he caused by air pollution further increases the degree of health inequalities among gro of different income levels [38,39]. As shown in Figure 7, "environment and health" recognized as the central keywords of Group I because of their high frequencies and c relationships with other research topics. Focusing on two central keywords, we identi four relevant research areas, i.e., "Air pollutant source", "Exposure-Response relat

Relationship of Keywords among Different Groups
Through keyword clustering, 10,212 records were divided into 96 clusters. In the keywords Galaxy map (Figure 7), we can roughly divide them into three groups, i.e., Group I, Group II, and Group III, based on the positional relationships of different clusters. Although the clusters located at the lower left of the Galaxy map are very dense, most of these clusters appear less than 50 times and, therefore, will not be analyzed further. Within each group, we develop co-occurrence networks and obtain more specific keyword classifications based on keyword frequency, relationship, and semantic analysis. The research hotspots are analyzed through the characteristics of three groups.

Relationship of Keywords among Different Groups
Through keyword clustering, 10,212 records were divided into 96 clusters. In the keywords Galaxy map (Figure 7), we can roughly divide them into three groups, i.e., Group I, Group II, and Group III, based on the positional relationships of different clusters. Although the clusters located at the lower left of the Galaxy map are very dense, most of these clusters appear less than 50 times and, therefore, will not be analyzed further. Within each group, we develop co-occurrence networks and obtain more specific keyword classifications based on keyword frequency, relationship, and semantic analysis. The research hotspots are analyzed through the characteristics of three groups.
(1) Group I The impact of the deterioration of the ecological environment, especially of the air quality, on human health has received a growing level of attention. The damage to health caused by air pollution further increases the degree of health inequalities among groups of different income levels [38,39]. As shown in Figure 7, "environment and health" are recognized as the central keywords of Group I because of their high frequencies and close relationships with other research topics. Focusing on two central keywords, we identified four relevant research areas, i.e., "Air pollutant source", "Exposure-Response relation- (1) Group I The impact of the deterioration of the ecological environment, especially of the air quality, on human health has received a growing level of attention. The damage to health caused by air pollution further increases the degree of health inequalities among groups of different income levels [38,39]. As shown in Figure 7, "environment and health" are recognized as the central keywords of Group I because of their high frequencies and close relationships with other research topics. Focusing on two central keywords, we identified four relevant research areas, i.e., "Air pollutant source", "Exposure-Response relationship", "Health & Mortality", and "Cost & Benefit". In terms of air pollutant sources, outdoor sources often refer to the cluster C37 (Emit, vehicle, industry), mainly including industry and vehicle emissions. Household pollution sources, such as the keyword clusters C77 (Fuel, energy, household) and C20 (Standard, management, ventilation), are often combined with specific issues, such as fuel burning, use of building materials, chemicals, and ventilation in household activities [40]. Most air pollutants monitored by remote sensing technology are always assessed by the exposure-response function [41], such as the clusters C6 (Model, Ozone, emit) and C89 (Particle, aerosol, concentration), to reflect such research trends. In addition, some studies have stated that air pollution can cause increased morbidity, including heart disease and chronic diseases [42]. And in Table S3, we also summarized highly co-cited documents among research clusters in Clusters I.
(2) Group II As shown in Figure 8, "exposure" is recognized as a core keyword, and three keywords with high frequency are closely related (i.e., particle, aerosol, source). With these four keywords as the core, four relevant research areas have been formed, namely "Air pollution source", "Air pollution monitoring", "Particulate matter concentration", and "Atmospheric aerosol". The current methods of air pollution control focus on source management, so identifying air pollution sources is still an important research area [43].Moreover, dust in cities can also increase the short-term mortality of vulnerable populations, such as extreme dust episodes in high-density Asian cities [44]. In order to formulate reasonable air pollution control policies, real-time monitoring of air quality becomes more important. In addition to common air monitoring stations [45], some studies suggest that air quality monitoring can be performed in innovative ways, such as through social media [46] and mobile sensors [47]. Though PM and ozone are the main monitored pollutants, related studies have gradually evolved from single-pollutant to multi-pollutant collaborative studies, such as ozone, particle, carbon monoxide, PM 2.5 , PM 10 , etc. [48]. As shown in Figure 8, there is a strong relationship between "particle" and "aerosol". Studies have found that severe haze pollution incidents were mainly caused by the formation of secondary aerosols. Moreover, in Table S4, we also summarized highly co-cited documents among research clusters in Clusters II. ship", "Health & Mortality", and "Cost & Benefit". In terms of air pollutant sources, outdoor sources often refer to the cluster C37 (Emit, vehicle, industry), mainly including industry and vehicle emissions. Household pollution sources, such as the keyword clusters C77 (Fuel, energy, household) and C20 (Standard, management, ventilation), are often combined with specific issues, such as fuel burning, use of building materials, chemicals, and ventilation in household activities [Error! Reference source not found.]. Most air pollutants monitored by remote sensing technology are always assessed by the exposureresponse function [41], such as the clusters C6 (Model, Ozone, emit) and C89 (Particle, aerosol, concentration), to reflect such research trends. In addition, some studies have stated that air pollution can cause increased morbidity, including heart disease and chronic diseases [Error! Reference source not found.]. And in Table S3, we also summarized highly co-cited documents among research clusters in Clusters I.
(2) Group II As shown in Figure 8, "exposure" is recognized as a core keyword, and three keywords with high frequency are closely related (i.e., particle, aerosol, source). With these four keywords as the core, four relevant research areas have been formed, namely "Air pollution source", "Air pollution monitoring", "Particulate matter concentration", and "Atmospheric aerosol". The current methods of air pollution control focus on source management, so identifying air pollution sources is still an important research area [Error! Reference source not found.].Moreover, dust in cities can also increase the short-term mortality of vulnerable populations, such as extreme dust episodes in high-density Asian cities [Error! Reference source not found.]. In order to formulate reasonable air pollution control policies, real-time monitoring of air quality becomes more important. In addition to common air monitoring stations [Error! Reference source not found.], some studies suggest that air quality monitoring can be performed in innovative ways, such as through social media [46] and mobile sensors [47]. Though PM and ozone are the main monitored pollutants, related studies have gradually evolved from single-pollutant to multi-pollutant collaborative studies, such as ozone, particle, carbon monoxide, PM2.5, PM10, etc. [48]. As shown in Figure 8, there is a strong relationship between "particle" and "aerosol". Studies have found that severe haze pollution incidents were mainly caused by the formation of secondary aerosols. Moreover, in Table S4, we also summarized highly co-cited documents among research clusters in Clusters II. (3) Group III As shown in Figure 9, "aerosol" is the central word in this section. Around the central keyword, four related research areas can be identified, including "Atmospheric physics", "Atmospheric chemistry", "Health & Mortality", and "Public & Occupational Health". (3) Group III As shown in Figure 9, "aerosol" is the central word in this section. Around the central keyword, four related research areas can be identified, including "Atmospheric physics", "Atmospheric chemistry", "Health & Mortality", and "Public & Occupational Health". Atmospheric physics and atmospheric chemistry are the basic disciplines in related research. They can explain the formation and transmission mechanisms of atmospheric pollutants and provide a theoretical basis for controlling air pollution [49]. The research field "Health & Mortality" was once again emphasized in the clustering results. Compared with Group I, this part has more diseases mentioned. The measurement of health risks from the mortality index [50] gradually concentrated on the incidence of specific diseases, including cancer, heart disease, respiratory diseases, and so on. Further, based on the complexity of multipollutant collaborative research [51], studies must develop more targeted evaluation models and apply new model fusion methods, i.e., the air pollution mortality/incidence risk (Ri-MAP) model [52], comprehensive health risk index, exposure-response coefficient [53], CMAQ/GCAM evaluation model [54], and so on. We are pleasantly surprised to find that public and occupational health research is further valued in Group III. Some studies currently classify occupational characteristics and social status to assess the health effects of air pollution on different groups [55]. For example, the socioeconomic status of parents, including education, income, and living area, has an impact on children's health, which suggests that health can play an important role in the intergenerational transmission of economic status [56]. In addition, the highly co-cited documents among research clusters in Cluster II have been summarized in Table S5.
Atmospheric physics and atmospheric chemistry are the basic disciplines in related research. They can explain the formation and transmission mechanisms of atmospheric pollutants and provide a theoretical basis for controlling air pollution [49]. The research field "Health & Mortality" was once again emphasized in the clustering results. Compared with Group I, this part has more diseases mentioned. The measurement of health risks from the mortality index [Error! Reference source not found.] gradually concentrated on the incidence of specific diseases, including cancer, heart disease, respiratory diseases, and so on. Further, based on the complexity of multi-pollutant collaborative research [Error! Reference source not found.], studies must develop more targeted evaluation models and apply new model fusion methods, i.e., the air pollution mortality/incidence risk (Ri-MAP) model [52], comprehensive health risk index, exposure-response coefficient [53], CMAQ/GCAM evaluation model [54], and so on. We are pleasantly surprised to find that public and occupational health research is further valued in Group III. Some studies currently classify occupational characteristics and social status to assess the health effects of air pollution on different groups [55]. For example, the socioeconomic status of parents, including education, income, and living area, has an impact on children's health, which suggests that health can play an important role in the intergenerational transmission of economic status [56]. In addition, the highly co-cited documents among research clusters in Cluster II have been summarized in Table S5.

Summary
This study conducted a bibliometric analysis of 14,955 articles on air pollution and health from 2001 to 2021. In the past two decades, the most productive country, the most productive institution, and the most productive journal are the United States, Harvard University, and Atmospheric Environment, respectively. We used OmniViz to cluster the keywords of 10,212 records, and the results show that the clusters with the most occurrences were "Exposure, model, mortality", "Health, environment, risk", and "Model, Ozone, emit". By observing the Thememap and Galaxy visualization results, additional popular topics in the study include "Health, environment, risk" and "Exposure, person, particle". Based on the clustering results, we developed co-occurrence networks and obtained more specific keyword classification results according to keyword frequency, relationship, and semantic analysis. We have identified the most influential areas, such as "Air pollutant source", "Exposure-Response relationship", "Public & Occupational Health", and so on. The research hotspots are analyzed through the characteristics of three groups of clusters. Indeed, this paper provides a qualitative and quantitative evaluation of the current research progress and trends in air pollution and health research.

Summary
This study conducted a bibliometric analysis of 14,955 articles on air pollution and health from 2001 to 2021. In the past two decades, the most productive country, the most productive institution, and the most productive journal are the United States, Harvard University, and Atmospheric Environment, respectively. We used OmniViz to cluster the keywords of 10,212 records, and the results show that the clusters with the most occurrences were "Exposure, model, mortality", "Health, environment, risk", and "Model, Ozone, emit". By observing the Thememap and Galaxy visualization results, additional popular topics in the study include "Health, environment, risk" and "Exposure, person, particle". Based on the clustering results, we developed co-occurrence networks and obtained more specific keyword classification results according to keyword frequency, relationship, and semantic analysis. We have identified the most influential areas, such as "Air pollutant source", "Exposure-Response relationship", "Public & Occupational Health", and so on. The research hotspots are analyzed through the characteristics of three groups of clusters. Indeed, this paper provides a qualitative and quantitative evaluation of the current research progress and trends in air pollution and health research.

Limitations and Future Research Directions
Nonetheless, although bibliometric analysis is an effective method for reviewing the literature, it is not without limitations. First, the bibliometric data from the Web of Science (including SCI-EXPAND and SSCI) are not produced exclusively for analysis, thus the data may contain errors, wherein the presence of errors is bound to influence any analysis performed using such data. Thus, to mitigate errors, we have carefully cleaned the bibliometric data that we searched. For example, we remove duplicates and erroneous entries. Second, the nature of the bibliometric method is in itself a limitation. We noticed that the qualitative assertions of bibliometrics can be subjective given that bibliometric analysis is quantitative in nature, whereas the relationship between qualitative and quantitative results is often unclear. To solve these problems, we tried to combine machine learning and the bibliometric method to make a more accurate results analysis, but there is still room for method improvement. Third, bibliometric studies can only offer a certain period forecast of the research field, and thus scholars should avoid making overly ambitious assertions in their research field. Notwithstanding these limitations, the bibliometric method can help us to overcome the fear of large bibliometric datasets and to pursue retrospectives of air pollution and human health. Indeed, the bibliometric methods can not only facilitate knowledge in this field but also help us to better understand the research trend. We take a short yet significant step in that direction.
Based on the bibliometric analysis, three main future directions of air pollution and health are identified in this study. The first future research direction is the advancement of health risk assessment techniques. At present, the air monitoring network does not take into account health factors such as changes in air pollution components, public medical data, and population activity patterns [57]. There is a lack of dynamic data to support the health risk assessment as the monitoring network of air pollution's health impact is far from mature [58]. Health risk assessment techniques currently face the challenge of transformation from qualitative research to quantitative research. Future research can be conducted to examine the route and trajectory of pollutant exposure to determine the actual intake and intake coefficient of air pollutants [59]. Subsequently, sophisticated time-activity patterns and individual exposure monitoring techniques should be employed more widely to determine accurate exposure doses [60]. More suitable biological targets should be identified to establish a quantitative relationship between the growth in the concentration of pollutants and the increase in mortality and disease prevalence.
The second future research direction is an interdisciplinary approach to quantifying human exposure to air pollution. Quantifying human exposure to air pollutants is a challenging task [61]. Exposure results from multifaceted relationships and interactions between environmental and human systems, adding complexity to the assessment process [62]. For assessment of the health risks of air pollution, related studies might evolve from epidemiology research to toxicology research. From the perspective of environmental toxicology, future studies can be undertaken on the biological mechanism of the bioavailability and toxicity of particulate matter. In order to establish more accurate quantitative models, real-time air pollution health risks can be described using mobile air pollution monitoring techniques, meteorological information, and land use information [61]. Air quality changes and mobile monitoring allow relevant departments to respond to these changes quickly. Consequently, the threshold for the impact of different pollutants on human health can be determined more accurately.
The third future research direction is management strategies in health risk assessment [63]. Proven effective management strategies include the establishment and enforcement of air standards, reduction in emissions from coal-fired power plants and other stationary sources, banning of the use of polluting fuels in urban centers, improvements to access to public transportation, and so on [64]. Future air pollution prevention strategies will emphasize integration and cooperation. Air pollution control in some key areas should pay attention to regional and departmental cooperation. Policy boundaries will become increasingly blurred, and mandatory regulatory policies may also need to incorporate economic incentives. A simple economic incentive policy will have a small audience and need to be combined with other policies to innovate the current policy tool system. Risk management strategies will shift from simple restrictions and prohibitions to more flexible multi-policy coordination.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/ijerph191912723/s1. Figure S1: Framework of literature search, analysis, and interpretation. Note: Science Citation Index (SCI); Social Science Citation Index (SSCI). Figure S2: Methodology and software support. Table S1: Keywords used in the search and results.  Table S3: Highly co-cited documents among research clusters in Cluster I. Table S4: Highly co-cited documents among research clusters in Cluster II. Table S5: Highly co-cited documents among research clusters in Cluster III.
Author Contributions: All authors contributed equally to this work. D.L. was responsible for analyzing the results with machine learning and bibliometric methods. K.C. made a substantial and meaningful contribution to the revision. K.H. was responsible for supervision. H.D. was responsible for reviewing, editing, and visualization with software. T.X. and Z.C. were responsible for searching, collecting, and screening related literature. Y.S. was responsible for writing, editing, and reviewing. All authors have read and agreed to the published version of the manuscript.