1. Introduction
Air pollution control is a common concern of all countries in the world. It is of great significance to achieve the goals of peak global carbon emissions and carbon neutrality, effectively curbing global warming and promoting sustainable development. Law enforcement investigations of air pollution incidents is an important means of air pollution control, and accurate detection of air pollution is an essential part of this. Currently, the detection of air pollution mainly relies on satellite remote sensing monitoring, ground monitoring, and other means. These technical means require a certain amount of labor cost and economic cost, and in some areas they cannot achieve complete coverage and timely detection of air pollution. However, with the development of economies and societies, increased public awareness of air protection, and the application of the internet and smart phones, public complaints have become the main source of air pollution clues. The public is the direct perceiver of air quality [
1], and their feelings about air pollution are directly reflected in their complaints about air pollution phenomena. Through timely and effective acquisition of statistics on public air pollution complaints, it is helpful to detect air pollution at a smaller spatial scale. Due to differences in the cognition and educational backgrounds of the public, air pollution complaint texts often contain issues such as ambiguous expressions and vague positioning. Currently, a lot of manpower and material resources are needed to obtain and confirm the time, address, and other information related to air pollution complaints, as well as spatial investigation and analysis, and therefore, it is impossible to identify air pollution through spatial analysis of public complaint data. By spatializing air pollution complaint information and combining it with regional air quality monitoring data, it is possible to effectively analyze the type and distribution of air pollution, to find frequent air pollution events in the region, and to provide guidance for the direction of air pollution investigation, thus, improving the efficiency of air pollution law enforcement and improving air pollution control methods. GIS technology can efficiently visualise multi-source spatio-temporal data and has been widely used in natural ecological environment [
2]. It is of great significance to use GIS technology to realize the spatialization of air pollution complaints.
Realizing the extraction and analysis of air pollution public complaints information is the basis for the spatialization of air pollution complaints. Scholars have carried out extensive studies on public participation in information extraction and spatialization of specific topics. Some studies have taken the administrative districts as the analysis unit, and used the positioning data in the social media data released by users to complete the sentiment spatio-temporal analysis research. Zhang et al. [
3] used municipal administrative districts as the unit of analysis to explore and analyze the sentiment value and severity of the disaster in each city of Guangdong Province during the landfall of Typhoon “Mangkhut”. Chen et al. [
4] extracted the public opinion information of COVID-19 epidemic based on micro-blog data, and visualized the distribution of sentiment and the number of micro bloggers in each province and municipality using provincial administrative regions as the spatial analysis unit. Wang et al. [
5] used micro-blog data with GPS information to analyze the “Beijing rainstorm event” and studied the spatial distribution of different topics for the event based on the LDA model to find the location of the disaster and did not reflect the aggregation degree of the topics. Since the above studies used the users’ network location information directly for the analysis, which is often incomplete, the results are highly biased.
Therefore, some studies have used machine learning and deep learning to extract the address and other main information from the text content of the public option, thus, obtaining more enriched and complete information. Alves et al. [
6] used the GeoSEn geoparser method to extract geographic information from text and convert it to geographic coordinates for sentiment spatial analysis, but the analysis results were biased due to inaccurate monitoring locations. Chen Zhang et al. [
7] realized sentiment tendency extraction of micro-blog comment text by natural language processing tools, and the research results reflected the attention of different regions to the COVID-19 epidemic and sentiment state distribution. Han et al. [
8] collected information related to the COVID-19 epidemic from micro-blog data and improved the BTM subject word extraction algorithm based on BERT word vector. Through subject word extraction and spatial clustering methods, the analysis of hot spots and sentiment spatial distribution characteristics was realized, and the changes of public sentiment in different regions were displayed. The above studies based on machine learning or deep learning have achieved the extraction of subject words and information, as well as the extraction of public attention and sentiment values for specific topics and the analysis of the sentiment spatial distribution characteristics. Nevertheless, such studies have been mainly based on provincial and municipal administrative area scales, which are not refined to smaller scales. Moreover, there are relatively few studies on information extraction and text classification from the analysis of the text content and spatial distribution features of public complaints against air pollution.
As an advanced pretrained word vector model, the BERT model can further enhance the generalization ability of the word vector model to fully describe the character-level, word-level, sentence-level, and inter-sentence relationship features, and better characterize the syntactic and semantic information in different contexts [
9]. The BERT model has the ability to characterize multiple meanings of a word, and the word vector is trained by the BERT model, based on which the CRF model is applied to decode and predict the best sequence. Since the air pollution complaint text has a specific sentence pattern and contains complex information content, the BERT+CRF model can be used to better classify complaints and extract relevant information from the air pollution complaint text.
Air quality is one of the most critical environmental issues in Beijing, China [
10]. Therefore, this study selects the text data of Beijing air pollution complaints from 2019 to 2020 to extract more accurate air pollution complaint information. The BERT+CRF model is used to extract the names, addresses, and time of complaint points in the text, to classify the air pollution complaint text, and to obtain the types of air pollution complaints, and then through address matching, spatialize the expression of air pollution complaint to obtain a more accurate spatial and temporal distribution of air pollution complaints at the township (street) level in Beijing. Combined with the comparative analysis of the ranking data of total suspended particulate (TSP) concentrations of townships (streets) in Beijing, it can be observed that there is a high degree of overlap between the key areas of public complaints about air pollution and the key pollution areas. This study shows that the type of air pollution public complaint and other main information, combined with spatial analysis results of regional air monitoring data, can reflect the causes and pollution conditions of regional air pollution, and provide guidance for the direction of air pollution law enforcement investigations.
The remainder of the paper is structured as follows: In
Section 2, we describe the experimental data sources and deep learning complaint text classification and information extraction methods; in
Section 3, we introduce the experimental parameter settings, experimental results, and evaluation; in
Section 4, we illustrate the spatial relation verification between complaints and polluted areas by using regional air monitoring data, and the complaint characteristics of the main complaint-intensive areas are analyzed; finally, several conclusions are given in
Section 5.
4. Discussion
Total suspended particulate (TSP) matter represents particulate matter with particle size less than 100 microns. The larger the particle size, the shorter the residence time in the air and the closer the transmission distance, therefore, the TSP concentration in Beijing can reflect the local environmental pollution situation.
In this study, the top 10 townships (streets) in the TSP ranking in the plain area of Beijing in the months with distinct seasonal characteristics in 2019 and 2020 were selected for statistical analysis, as shown in
Figure 4 below. The TSP concentration ranking in Beijing was counted every half month, and the data of the first and second half of March, May, August, October, and December 2019 were selected to complete the analysis study.
The figure shows that 15 townships (streets) in the Daxing District, 13 townships in the Tongzhou District, and 10 townships in the Chaoyang District are the districts with the highest number of listed complaints and the most intensive distribution of ranking in each statistical month, proving that these three districts have the source of regular air pollution. Therefore, in this study, we verified the relationships between changes in regional air environment quality and complaints by further analyzing the types of complaints and differences in complaint distribution among the listed townships (streets) in these three districts and exploring the main causes affecting the regional air environment.
Using the kernel density analysis function of GIS, in this study, we combined the complaint time and extracted complaint information in the data to produce a heat distribution map of air pollution complaints in the Daxing, Tongzhou, and Chaoyang Districts in 2019, and marked the top ten townships (streets) in these three districts on the 2019 Beijing TSP concentration ranking list on the map.
As shown in
Figure 5 below, the distribution trend of complaint-intensive areas is reflected in three areas from the Chaoyang District to the northwestern Tongzhou District including the Ronghua Street area in the Daxing District Economic Development Zone, Caiyu town, and Changziying Town in the Daxing District, Yujiagou Town and Yongledian Town area in the Tongzhou District, and the Daxing District from Gaomidian Street in the north to Panggezhuang Town area in the south.
In the intensive complaint area from the Chaoyang District to the northwest of Tongzhou District, including Ronghua Street in the Daxing Economic Development Zone, the top 10 townships (streets) in terms of TSP concentration ranking list with more intensive complaints include Olympic Village Street, Xiaoguan Street, Sanlitun Street, and Hujialou Street in the Chaoyang District; Pingfang Town; Gaobeidian Town; Sanjianfang Town and Guanzhuang Town in the Chaoyang District; and six streets including Xinhua Street, Zhongcang Street, and Yuqiao Street in the Tongzhou District. In these densely populated and economically active complaint-intensive areas, 43% of the complaints were about restaurant fume pollution, 21% were about waste gas pollution and 17% were about mobile source pollution. The above three categories were the top three complaint types, which mainly encompassed the catering, urban heating, auto repair, and domestic service industries. In Ronghua Street, Daxing Economic and Technological Development Zone, the number of complaints about waste gas pollution accounted for 50% of the total number of complaints, mainly for the manufacturing industry. The southern part of the area has some areas of intensive complaints due to the construction of a general waste treatment plant and landfill site there, which has been repeatedly complained about by the affected residents in the vicinity.
Caiyu Town and Changziying Town in the Daxing District, and Yujiawu Town and Yongledian Town in the Tongzhou District are the areas where complaints about dust pollution and waste disposal pollution are mainly concentrated, accounting for 70% of the total number of complaints. The intensive complaint area within the Daxing District from Gaomidian Street in the north to Panggezhuang Town in the south is a densely populated area, yet its type of economic activities is different from that of the city center, with 96% of the complaints about restaurant fume pollution and waste gas pollution, mainly concentrated in the catering, auto repair, printing, and some small manufacturing industries.
In contrast, other areas, including the Tongzhou District, have only a sporadic distribution of complaints on its list of towns and streets, mainly due to the distribution of the production of waste gas and dust pollution of small manufacturing enterprise, and part of the production of odor pollution farms in this area. Numerous pollution complaints within Anding Town in the Daxing District were mostly about the landfill site within the town. Yufa Town and Lixian Town in the Daxing District had fewer complaints, mainly due to the distribution of active activities such as construction works and dust-generating sites, straw burning, etc.
Analysis of the data for the first and second half of March, May, August, October, and December of 2020 indicates that the Daxing, Tongzhou, and Chaoyang Districts have 13, 16, and 8 townships (streets) on the list, respectively, and the three aforementioned districts are also the districts with the highest number on the list (
Figure 6). As compared with the data for 2019, the Tongzhou District has the most intensive distribution of ranking data among the statistical months. The overall ranking shows similar distribution characteristics of TSP concentration as in 2019. Taking into account the characteristics of the TSP concentration ranking list, we further analyzed the differences in the types of complaints and specific situations in the townships (streets) in the three districts.
Using the kernel density analysis function of GIS, the heat distribution maps of air pollution complaints in the Daxing, Tongzhou and Chaoyang Districts in 2020 were produced, and the top 10 townships (streets) of these three districts on the ranking list of TSP concentration in Beijing in 2020 are marked on the map. As shown in
Figure 7 below, the intensity of economic activities in Beijing is lower in 2020 due to the COVID-19 epidemic, and the number of complaints is lower as a consequence. Therefore, the intensity of air pollution complaints in 2020 is weaker than in 2019, but there is little change in the overall distribution trend.
The distribution trend of the complaint-intensive area is reflected in the northwest of the Chaoyang District to the Tongzhou District, including the Ronghua Street area of the Daxing District Economic Development Zone, some townships (streets) in the northern part of the Daxing District, and townships in the Tongzhou District, such as Xiji Town, Kuoxian Town, and Zhangjiawan Town. In addition, there are sporadic complaint areas in the Daxing District and the southern part of the Tongzhou District.
It can be seen from the map that intensive complaint area extends from the Chaoyang District to the northwest of the Tongzhou District, including Ronghua Street in the Daxing District Economic Development Zone; in this area, Heizhuanghu Town, Dougezhuang Town, and Guanzhuang Town, in the Chaoyang District, and Taihu Town, Zhongcang Street, and Xinhua Street, in the Tongzhou District are listed in the top 10 townships (streets) of the TSP concentration ranking list. The proportion of complaint types in the area is distributed as follows: 40% of complaints about dust pollution, and 54% of complaints about garbage disposal pollution, waste gas pollution, and restaurant fumes pollution.
The listed township (street) intensive complaint area is mainly located in the junction area of the Chaoyang District and Tongzhou District, where the main economic activities are catering, urban heating, auto repair and painting, garbage disposal, domestic services, and other pollution industries. Among the complaints from Rong Hua Street, in the Daxing Economic Development Zone and from Taihu Town, in the Tongzhou District, 43.5% were about waste gas pollution, 22.3% were about dust pollution, and 18.8% were about odor pollution, with the above complaints mainly targeting manufacturing, landfill and resource recycling treatment, and farming.
The intensive complaint area in the northern part of the Daxing District includes six townships (streets) on the TSP concentration ranking list, such as Gaomidian Street and Qingyuan Street. In these areas, complaints about restaurant fumes, dust, and waste gas pollution accounted for 79% of the total complaints, which were mainly from catering, auto repair, printing, and some small manufacturing industries.
Other areas in the intensive complaint areas included Xiji Town, Kuoxian Town and Zhangjiawan Town of the Tongzhou District. Complaints in these areas were mainly concentrated on waste gas, dust, and odor pollution generated by the processing and manufacturing industries. In Yongledian Town of the Tongzhou District and Qingyundian Town of the Daxing District, complaints were mainly about waste gas and odor pollution from waste disposal.
There were also sporadic complaints against villages in townships (streets) on the TSP concentration ranking data list, such as Lucheng Town and Majuqiao Town in the Tongzhou District and Caiyu Town and Yufa Town in the Daxing District, which were mainly about waste gas, dust, and odor pollution generated by small- and medium-sized manufacturing, farming, and catering industries.
There were fewer complaints in 2020 from Anding Town, Changziying Town, and Lixian Town in the Daxing District, Yujiawu Town in the Tongzhou District, and several other townships (streets) on the list, which were mainly polluted by waste gas, dust, and odor from manufacturing, construction projects, and aquaculture in the area. Furthermore, the abovementioned area is also vulnerable to cross-regional pollution as it borders on Hebei Province. Nevertheless, due to the sparse distribution of the population in the area and the lack of rights protection awareness among local residents, the number of complaints in the area is low despite the poor environmental quality.
On the basis of analysis of the TSP concentration ranking data and complaint hotspots in major townships (streets) in 2019 and 2020, most of the top 10 townships (streets) in terms of TSP concentration in these two years were located in areas with an intensive distribution of complaints, proving that the public was more sensitive to changes in regional environmental quality and prone to complain about conventional sources of pollution in their vicinity. However, due to different areas the public lived, their different levels of education, and awareness of rights, as well as the diffuse nature of air pollution, the density of complaints was unevenly distributed in different areas, with different distributions of complaints about different types of pollution.