Perceiving Residents’ Festival Activities Based on Social Media Data: A Case Study in Beijing, China

: Social media data contains real-time expressed information, including text and geographical location. As a new data source for crowd behavior research in the era of big data, it can reﬂect some aspects of the behavior of residents. In this study, a text classiﬁcation model based on the BERT and Transformers framework was constructed, which was used to classify and extract more than 210,000 residents’ festival activities based on the 1.13 million Sina Weibo (Chinese “Twitter”) data collected from Beijing in 2019 data. On this basis, word frequency statistics, part-of-speech analysis, topic model, sentiment analysis and other methods were used to perceive different types of festival activities and quantitatively analyze the spatial differences of different types of festivals. The results show that traditional culture signiﬁcantly inﬂuences residents’ festivals, reﬂecting residents’ motivation to participate in festivals and how residents participate in festivals and express their emotions. There are apparent spatial differences among residents in participating in festival activities. The main festival activities are distributed in the central area within the Fifth Ring Road in Beijing. In contrast, expressing feelings during the festival is mainly distributed outside the Fifth Ring Road in Beijing. The research integrates natural language processing technology, topic model analysis, spatial statistical analysis, and other technologies. It can also broaden the application ﬁeld of social media data, especially text data, which provides a new research paradigm for studying residents’ festival activities and adds residents’ perception of the festival. The research results provide a basis for the design and management of the Chinese festival system.


Introduction
Festivals are one of the representative cultures of a country, nation or region. They have multiple functions such as gathering social consensus, inheriting traditional culture, and enriching spiritual life [1]. In the process of modernization and globalization, and with economic growth, civic living conditions have gradually improved. Associated with this, the life-autonomy of city residents has increased, and options for festival activities have increased [1,2]. How to provide full access to and enrich the diverse culture and functions of traditional Chinese festivals and the revival of traditional Chinese culture are issues of current general concern to Chinese society.
In the process of globalization and modernization, conflicts and exchanges between different cultures have gradually increased. There are relatively few studies on the use of big data applied to festival cultural perception. More commonly used research frameworks to investigate folk customs and social activities are based on analysis during the actual situation, combined with theoretical analysis, and these are then used to suggest management options [3,4]. There have been many studies of the inheritance and development of Chinese festivals from perspectives and these have provided suggested improvements. Zhang proposed that Chinese festivals are in an era of development and made suggestions on the design of Chinese festivals from the perspective of history and folklore [1]. Wang briefly reviewed the inheritance and development of Chinese traditional festivals in Hong Kong, Macau, and Taiwan [5]. Li used methods such as field investigations, questionnaire surveys, and literature studies to analyze the status of traditional Chinese festivals and to propose further developments [6]. However, it should be noted that the method of collecting relevant information based on field trips, questionnaire surveys, or interviews has high time and money costs, and is subject to questionnaires [7]. Restrictions on design, interview rules, and personal subjective factors had greatly affected the accuracy of the data, and because the temporal and spatial scale of the sample coverage was small, there was a certain risk to the reliability of the data and conclusions. Since most of the research on festivals and culture's research methods were based on field trips, questionnaire surveys, literature research, and other methods [8,9].
With the widespread adoption of mobile devices and location-based services, social media data have increasingly attracted the attention of scholars due to their large user base, rich spatiotemporal and semantic information, and low cost of access [10,11]. Meanwhile, understanding how conversational discourse on online social networks changes semantically and geographically over time will help reveal the dynamic changes of interpersonal relationships and digital traces of social events [12]. Xie and others used sign-in data of the Sina Weibo social media platform in Beijing in 2016. They used the TF-IDF (term frequency-inverse document frequency) algorithm based on geographic location information and spatial clustering to locate hot spots in Beijing in order to study social and cultural differences and crowd behaviors between different areas of Beijing [13].
Studying people's behavior is of great importance to urban planning and design and to the improvement of the living standards of residents [14,15]. Traditional methods of collecting human behavior data such as surveys are only suitable for small sample research projects. Moreover, these methods are time-consuming and costly, and the results obtained are difficult to update. In recent years, people are willing to disclose useful personal information on social media [16]. How to fully mine social media data to obtain residents' opinions of festivals has become an important topic of current research. Garay used social media (especially Twitter) to analyze the potential contribution of festivals in generating the image of festival destinations, but their research goals were more focused on the commercial value of festivals [17]. Zhou selected Sina Weibo data from 2012 to 2014 related to the five traditional festivals of the Spring Festival, the Lantern Festival, the Qingming Festival, the Dragon Boat Festival, and the Mid-Autumn Festival. People's perception of traditional Chinese festivals and regional differences in their perception of traditional festivals were investigated using word frequency analysis and LDA theme analysis [18].
Existing related research has achieved important results in research on festival activities and human perceptions [10,19]. Liu and others used social media data to study the daily activities of residents. Based on this, the proposed framework integrates textual semantic analysis, statistical method, and spatial techniques, broadens the application areas of social media data, especially text data, and provides a new paradigm for the research of residents' activities and spatiotemporal behavior [20]. However, there are relatively few studies on the analysis of residents' festival activities from two aspects, text mining and space analysis. Therefore, there is still a lot of room for research on festival activities based on social media data. We process the unpredictable, sparse, and irregular data that appears in location-based social networks, and convert this uncertain, noisy geo-tagged data into useful, well-structured high-level information [21,22] (for example, the space distributed for festival events). Minatel proposed that when using stay points to construct LBSN, it presents much more information since GPS logs convey more users' mobility information [23]. It is a very challenging task to easily explain this, to make better decisions for further festival construction. There were relatively few researches using big data from this perspective. Therefore, there is still a lot of room for research on festival activities based on social media data.
On a small scale, such as that of a region or city, the comparison of residents' perceptions of various festivals needs further research. With the widespread adoption of mobile devices and location-based services, social media data have increasingly attracted the attention of scholars due to their large user base, rich spatiotemporal and semantic information, and low cost of access [10,11]. Meanwhile, understanding how conversational discourse on online social networks changes semantically and geographically over time will help reveal the dynamic changes of interpersonal relationships and digital traces of social events [12]. Xie and others used sign-in data for the Sina Weibo social media platform in Beijing in 2016. They used the TF-IDF (term frequency-inverse document frequency) algorithm based on geographic location information and spatial clustering to locate hot spots in Beijing in order to study social and cultural differences and crowd behaviors between different areas of Beijing [13].
Using big data analysis and text mining research methods, it is possible to examine the attitudes, activities, and preferences of people in different areas of a city, and reveal social, cultural, and functional characteristics of hot spots [24,25]. Such research methods can also be used to enhance cultural perception, to explore cultural connotations of traditional Chinese festivals in order to revive traditional Chinese festivals, and to provide suggestions and solutions to meet the requirements of the current era [26].
Using social media data from the Sina Weibo platform, based on the text and spatial temporal information, the residents' festival activities are studied from two aspects: text mining and spatial analysis. Through the integration of natural language processing technology, spatial analysis, statistical analysis, and other technical means, it provides a new research paradigm for festival culture research. This research focuses on the behavioral characteristics of Beijing residents' festival activities and their perceptions of various types of festivals. Firstly, the behaviors of festival activities are classified by extracting keywords and other information from Weibo text. The spatial patterns of various actions are then mapped. This research discussed the sensing and spatial characteristics of residents' festival activities.
The rest of this article is organized as follows. In Section 2, data collection and research methods are introduced. In Section 3, the results of sorting and categorizing the information of residents' festival activities are described, and the semantic characteristics, perceived content, and temporal and spatial patterns of residents' festival activities are analyzed. In Section 4, the advantages and disadvantages of the research methods used in this article are discussed. Finally, in Section 5, we summarize our study, draw conclusions, and propose future research directions.

Study Area
Beijing is the capital of the People's Republic of China, a national central city, and a mega city. The Chinese Political Center, Cultural Center, International Exchange Center, and Science and Technology Innovation Center approved by the State Council. As of 2018, the city had 16 districts with a total area of 16,410 square kilometers. At the end of 2019, the permanent population was 21.536 million and the urban population was 18.65 million. The urbanization rate was 86.6%. The GDP of Beijing area was 3537.13 billion Yuan. The added value of the tertiary industry accounted for 83.5% of the regional GDP [27]. Beijing was rated as the world's first-tier city by the Globalization and World Cities Research Network (GaWC) [28]. According to the seventh Chinese national census data, among the total 21.893 million permanent residents in Beijing, the population aged 0-14 is 11.9%; the population aged 15-59 is 68.5%; and the population aged 60 and above is 19.6% [29]. Beijing is an ancient capital with a history of more than 3000 years, with rich historical and cultural heritage. The city is also a symbol and image of China, and a primary window to show China to the world. It has always attracted great attention at home and abroad.

Data
Sina Weibo is a social media platform with a large amount of social media data. According to the Sina Weibo User Development Report 2020 [30], the number of monthly active users of the software reached 511 million. Statistics from the Weibo Data Center in December 2020 show that Sina Weibo has a very high coverage rate of the city population in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen [30]. The Sina Weibo data contains a considerable amount of various geographic information. Through the Sina Weibo software and web crawler tools, we obtained Sina Weibo data from Beijing for the year 2019 and captured the content of Weibo posts in a targeted manner. The data included Weibo ID, latitude and longitude, time, mobile terminal, region, text content, and other information. In total, more than 1.13 million individual pieces of data were obtained as the data source for this study (Figure 1). cultural heritage. The city is also a symbol and image of China, and a primary window to show China to the world. It has always attracted great attention at home and abroad.

Data
Sina Weibo is a social media platform with a large amount of social media data. According to the Sina Weibo User Development Report 2020 [30], the number of monthly active users of the software reached 511 million. Statistics from the Weibo Data Center in December 2020 show that Sina Weibo has a very high coverage rate of the city population in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen [30]. The Sina Weibo data contains a considerable amount of various geographic information. Through the Sina Weibo software and web crawler tools, we obtained Sina Weibo data from Beijing for the year 2019 and captured the content of Weibo posts in a targeted manner. The data included Weibo ID, latitude and longitude, time, mobile terminal, region, text content, and other information. In total, more than 1.13 million individual pieces of data were obtained as the data source for this study (Figure 1).

Methods
The framework of residents' festival activities research as follows ( Figure 2).

Methods
The framework of residents' festival activities research as follows ( Figure 2). cultural heritage. The city is also a symbol and image of China, and a primary window to show China to the world. It has always attracted great attention at home and abroad.

Data
Sina Weibo is a social media platform with a large amount of social media data. According to the Sina Weibo User Development Report 2020 [30], the number of monthly active users of the software reached 511 million. Statistics from the Weibo Data Center in December 2020 show that Sina Weibo has a very high coverage rate of the city population in first-tier cities such as Beijing, Shanghai, Guangzhou, and Shenzhen [30]. The Sina Weibo data contains a considerable amount of various geographic information. Through the Sina Weibo software and web crawler tools, we obtained Sina Weibo data from Beijing for the year 2019 and captured the content of Weibo posts in a targeted manner. The data included Weibo ID, latitude and longitude, time, mobile terminal, region, text content, and other information. In total, more than 1.13 million individual pieces of data were obtained as the data source for this study (Figure 1).

Methods
The framework of residents' festival activities research as follows ( Figure 2).  2018, able to translate the input sentences or paragraphs into corresponding semantic features, which has performed amazingly well and become an important recent advancement in NLP [31]. In this research, we used the Simple Transformers library [32], which is based on the Transformers library by HuggingFace [33], to build our model. The model can be quickly trained and evaluated.
First, on the basis of cleaning 1,136,125 Weibo posts (removing tags, attaching emails, forwarding links, expressions, videos, sharing pictures, and other information unrelated to the content of the text), the pre-trained model of BERT-base-Chinese was initialized to do binary classification. Secondly, 7000 randomly selected posts as training samples were used to train the model. For each post, if it was related to the residents' festival activities, it was marked as 1, otherwise it was marked as 0. Machine learning and the original BERT model were then used to verify the classification accuracy. By adjusting the corresponding parameters and the number of iterations several times under the experiment, a trained text multi-classification model was obtained (the model accuracy reached 97%). Third, based on the derived classifier, all Weibo entries were input into BERT to classify Weibo with residents. After classification and extraction, there were 213,649 social media posts related to the festival.

Word Frequency Statistics
The word frequency statistics based on the TF-IDF algorithm are to evaluate the importance of a word to a text. If a certain word or phrase appears frequently in an article, and appears low in the document collection, this word or phrase is considered as having good ability to distinguish categories [34].
Specifically, first, the Weibo data was segmented based on Jieba. The purpose is to split the words in the text and transform the text into multiple words in order. Word segmentation was equivalent to feature extraction, and the extracted words were called feature words. After obtaining the characteristic words, this research then used the custom dictionary and stop word database to filter out some prepositions and symbols because the text was more complex and the word content was large. Finally, the characteristic words were selected, which played a major role in text classification and topic analysis, and ranked them in order of importance.

Topic Model LDA
Latent Dirichlet allocation (LDA) is one of typical "bag of words" model [34] and has a wide range of applications [35]. It is a standard topic model that can work with social media data where there is a problem of short text and large sparseness [36]. Its basic idea is that the text is randomly mixed and generated from implicit topics, and each topic corresponds to a specific feature word distribution [37].
This study constructed a three-layer Bayesian structure of "text-topic-word" based on social media data. The topic of each text in the text set is given in the form of a probability distribution, so as to classify topics according to the topic distribution. This research attempted to create a list of topics through the results to examine the spatial characteristics of Beijing residents' festival activities and to visualize the results [38].

Spatial Analysis
Spatial analysis is a widely used analysis method in geography [39]. In this study, the main focus is on the spatial distribution of data. Related methods include density analysis, spatial interpolation analysis, spatial visualization, and measurement of geographic distribution [40]. Festival-related Weibo content was displayed in space through topic clustering and kernel density analysis was performed to observe the hot spots in the space.

Festive Event Word Frequency Statistics
Festivals with more than 10,000 Weibo posts were National Day, Mid-Autumn Festival, New Year's Day, Christmas Day, Lantern Festival, and Christmas Eve ( Table 1). As 2019 was the 70th anniversary of the founding of the People's Republic of China, most Weibo posts were related to the National Day. The status of the family in the Chinese people's concept of festival is culturally important, and hence the Mid-Autumn Festival with the theme of a family reunion was the second largest festival-related Weibo content in 2019. All of the 213,649 festival-related Weibo posts from Beijing in 2019 were sorted by word frequency statistics ( Figure 3). As 2019 was the 70th anniversary of the founding of the People's Republic of China, the frequency of words related to the National Day, such as "motherland", "happy birthday", "70", was high. The frequency of entries related to the Mid-Autumn Festival was also high. In Chinese festival activities, eating food was clearly an essential behavior and a principal way people participated in festivals.

Festive Event Word Frequency Statistics
Festivals with more than 10,000 Weibo posts were National Day, Mid-Autumn Festival, New Year's Day, Christmas Day, Lantern Festival, and Christmas Eve ( Table 1). As 2019 was the 70th anniversary of the founding of the People's Republic of China, most Weibo posts were related to the National Day. The status of the family in the Chinese people's concept of festival is culturally important, and hence the Mid-Autumn Festival with the theme of a family reunion was the second largest festival-related Weibo content in 2019. All of the 213,649 festival-related Weibo posts from Beijing in 2019 were sorted by word frequency statistics ( Figure 3). As 2019 was the 70th anniversary of the founding of the People's Republic of China, the frequency of words related to the National Day, such as "motherland", "happy birthday", "70", was high. The frequency of entries related to the Mid-Autumn Festival was also high. In Chinese festival activities, eating food was clearly an essential behavior and a principal way people participated in festivals. Based on all festival-related Weibo content in 2019, the main content of residents' perception of festivals and the main ways of participating in festivals were reflected in word cloud diagrams (Figure 4). High-frequency words corresponded to festivals with a large number of Weibo posts in 2019. For example, words such as "motherland", "China", and "happy birthday" were also reflected in word cloud maps for National Day, Mid- Based on all festival-related Weibo content in 2019, the main content of residents' perception of festivals and the main ways of participating in festivals were reflected in word cloud diagrams (Figure 4). High-frequency words corresponded to festivals with a large number of Weibo posts in 2019. For example, words such as "motherland", "China", and "happy birthday" were also reflected in word cloud maps for National Day, Mid-Autumn Festival, New Year, and other related words. Words such as "eat" and "delicious" reflected the main ways that residents participate in festivals. Autumn Festival, New Year, and other related words. Words such as "eat" and "delicious" reflected the main ways that residents participate in festivals. The festivals were divided into three categories: traditional festivals, foreign festivals, and modern festivals, sorted according to the number of related posts from most to least, and the proportion of the number of posts of various types of festivals in the total data were calculated. The results were shown in Table 2.  The festivals were divided into three categories: traditional festivals, foreign festivals, and modern festivals, sorted according to the number of related posts from most to least, and the proportion of the number of posts of various types of festivals in the total data were calculated. The results were shown in Table 2.
Beijing residents posted the largest number of Weibo posts related to traditional festivals, accounting for 40.46%. Among the traditional festivals, the Mid-Autumn Festival, with the theme of a family reunion, was the most frequently mentioned. However, the number of Weibo posts related to the Spring Festival was relatively small. This was due to the fact that the time span of the Spring Festival was long. Only the Weibo data on the day of the holiday was extracted here, so there was a deviation in the number of Weibo posts. In addition, Weibo users tend to be young, and hence the Weibo post data may not reflect the feelings of middle-aged and elderly people.  Traditional festivals are closely related to Chinese history and culture. In order to explore the degree of attention to traditional culture in Weibo, it is necessary to analyze some relatively low-frequency words in the characteristic words (Table 3). Residents' festival activities are greatly influenced by traditional culture. This is not only reflected in clothing and locations, such as "Hanfu" and "Confucian Temple". In traditional festivals, the influence of traditional culture is more obvious. "Will live long as he can!", "From far away you share this moment with me." and other phrases corresponding to the Mid-Autumn Festival appear more frequently. Table 3. Traditional culture-related word frequencies.

Words Related to Traditional Culture Frequency
He who does not reach the Great Wall is not a true man. We also found that some festival activities, especially some foreign festivals, have a certain connection with religion (Table 4). In the published text information, not only the names of religious beliefs are clearly mentioned, but the names of religious places appear relatively frequently on the day of the festival.  Figure 5 shows the internal proportions of various types of different types of festival data, and a longitudinal comparison of the same type of data. You can find the same type of festival data, the proportions of different parts of speech and types. Especially in traditional festivals, verbs make up the largest proportion of words, which is significantly different from other types of festivals. Figure 6 is a horizontal comparison of different types of festival data of the same parts of speech. Modern festivals have the most holiday features in nouns, and traditional festivals in have "Eating" as the most common verb.

Semantic Sensing of Festival Activities
We also found that some festival activities, especially some foreign festivals, have a certain connection with religion (Table 4). In the published text information, not only the names of religious beliefs are clearly mentioned, but the names of religious places appear relatively frequently on the day of the festival.  Figure 5 shows the internal proportions of various types of different types of festival data, and a longitudinal comparison of the same type of data. You can find the same type of festival data, the proportions of different parts of speech and types. Especially in traditional festivals, verbs make up the largest proportion of words, which is significantly different from other types of festivals. Figure 6 is a horizontal comparison of different types of festival data of the same parts of speech. Modern festivals have the most holiday features in nouns, and traditional festivals in have "Eating" as the most common verb.   We also found that some festival activities, especially some foreign festivals, have a certain connection with religion (Table 4). In the published text information, not only the names of religious beliefs are clearly mentioned, but the names of religious places appear relatively frequently on the day of the festival.  Figure 5 shows the internal proportions of various types of different types of festival data, and a longitudinal comparison of the same type of data. You can find the same type of festival data, the proportions of different parts of speech and types. Especially in traditional festivals, verbs make up the largest proportion of words, which is significantly different from other types of festivals. Figure 6 is a horizontal comparison of different types of festival data of the same parts of speech. Modern festivals have the most holiday features in nouns, and traditional festivals in have "Eating" as the most common verb.   Nouns reflected residents' perception of festivals, especially the representative symbols and elements of festivals, for example, the nouns "moon cake", "zongzi", and "tangyuan", as these traditional Chinese foods were used in relation to the traditional festivals, i.e., Mid-Autumn Festival, Dragon Boat Festival, and Lantern Festival, respectively. Words such as "Santa Claus", "Christmas gift", and "apple" were used, related to foreign festivals, i.e., Christmas and Christmas Eve. For modern festivals, words like "mother country" and "China", related to National Day, were used frequently.

Semantic Sensing of Festival Activities
Regardless of the type of festival, the word "Forbidden City" appeared frequently. This indicates that the local attraction of the Forbidden City has become an indispensable part of festivals in the perception of Beijing residents, providing an emotional support and cultural symbol. Finally, the proportion of Weibo terms of each type of festival showed that the proportion of traditional festivals was the largest, as high as 59%, which showed that residents had the most abundant perception of traditional festivals.
All high-frequency words were divided into four categories according to part of speech and semantic content. For example, such as "eat", "drink" etc. in the group verb. In order to better summarize such activities, we named these words "eating". Activities that can also be carried out in daily life such as "check in" and "walking around" are named "leisure activities". Because of the limited space of other word classifications, there is not much explanation. Verbs reflect the main behaviors of residents participating in festivals. From the frequency of words, the behaviors of Beijing residents participating in festivals appeared relatively uniform across festival types ( Figure 6). For example, words such as "eat" and "check in" indicate that the main behaviors of residents participating in festivals were associated with dining. It would appear that online celebrity shops' "check in" has become an important way for Beijing residents to participate in festivals.
Adjectives mainly reflect the emotional expression of residents towards festivals, and different types of festivals corresponded to different emotional expressions. "Ching Ming" in traditional festivals corresponded to the Ching Ming Festival. Words such as "peaceful ", "smooth", and "consummate" were cultural manifestations of traditional festivals. The word "peaceful" in foreign festivals appeared most frequently, which corresponded to people's wish for peace on Christmas Eve. High-frequency adjectives used for modern festivals reflected the concentration of residents on the National Day, expressing pride in the motherland and giving positive comments on the status quo of the motherland, with adjectives such as "safe", "strong", and "prosperity". Figure 7 shows the kernel density distribution map of Beijing residents' Weibo posts related to festivals in 2019, overall and by festival type. The density distribution of traditional festivals was not much different from that of modern festivals, although the central density of residents' postings related to traditional festivals was denser than that of modern festivals. The density of foreign festivals appeared much lower than either traditional or modern festivals, but there appeared to be many areas with no posts, suggesting that traditional festivals still occupy the main position of Chinese residents' holiday behavior and culture. This contradicts the perception that traditional festivals are being significantly impacted and influenced by foreign festivals.

Theme Sensing of Festival Activities
Among the 29 festivals in 2019, the LDA theme model divided the festival-related posts into three types: the emotional expression of the posts; the specific behavior of residents; and the representative culture of the related festival. Residents' festival activities were roughly divided into two categories: eating food with relatives and friends and going to various restaurants to check-in; going to multiple tourist attractions and festival activities. The LDA model analysis was applied to the three festival types; modern, traditional, and foreign, and results were imported into ArcGIS for thematic spatial analysis.
The 5 topics, each topic was more evenly distributed in space, but topic 2 was most distributed in space (Figure 8). Comparing Table 5, the high-frequency words of topic 2 mainly correspond to the Mid-Autumn Festival and the Spring Festival, such as "moon cake", "reunion", "year of pig", and "good luck".

Theme Sensing of Festival Activities
Among the 29 festivals in 2019, the LDA theme model divided the festival-related posts into three types: the emotional expression of the posts; the specific behavior of residents; and the representative culture of the related festival. Residents' festival activities were roughly divided into two categories: eating food with relatives and friends and going to various restaurants to check-in; going to multiple tourist attractions and festival activities. The LDA model analysis was applied to the three festival types; modern, traditional, and foreign, and results were imported into ArcGIS for thematic spatial analysis.
The 5 topics, each topic was more evenly distributed in space, but topic 2 was most distributed in space (Figure 8). Comparing Table 5, the high-frequency words of topic 2 mainly correspond to the Mid-Autumn Festival and the Spring Festival, such as "moon cake", "reunion", "year of pig", and "good luck".  Happy new year, fifteenth lunar month The theme space distribution of foreign festivals was not as wide as that of traditional festivals, but there are obvious spatial differences in the theme space distribution ( Figure  9). Theme 1 is mainly distributed in the area outside the Fifth Ring Road in Beijing, and theme 4 is mainly distributed in the area inside the Fifth Ring Road. According to the topic high-frequency words in Table 6, topic 1 was mainly associated with residents' emotional perception and expression of the festival, with as words such as "happiness", "hope", and "peace". Theme 4 was mainly related to specific behaviors of residents participating in festivals, such as "Christmas gifts" and "apples", which means that residents participating in Christmas mainly give gifts and apples to express their care for relatives and friends.  Happy new year, fifteenth lunar month The theme space distribution of foreign festivals was not as wide as that of traditional festivals, but there are obvious spatial differences in the theme space distribution (Figure 9). Theme 1 is mainly distributed in the area outside the Fifth Ring Road in Beijing, and theme 4 is mainly distributed in the area inside the Fifth Ring Road. According to the topic high-frequency words in Table 6, topic 1 was mainly associated with residents' emotional perception and expression of the festival, with as words such as "happiness", "hope", and "peace". Theme 4 was mainly related to specific behaviors of residents participating in festivals, such as "Christmas gifts" and "apples", which means that residents participating in Christmas mainly give gifts and apples to express their care for relatives and friends.  Theme 2 and Theme 3 for modern festivals also showed significant spatial differences ( Figure 10). Combined with the high-frequency words in Table 7, high-frequency words in theme 2 included "Happy New Year", "Military Parade", "Hope", "Fireworks", "Tiananmen Square", and other words, some of which expressed the best wishes of residents during the festival. The other part mainly described the representative symbols and constituent elements of festivals, especially National Day. The high-frequency words of theme 3, such as "delicious", "check in", and "taste" were associated with food and eating.  Theme 2 and Theme 3 for modern festivals also showed significant spatial differences ( Figure 10). Combined with the high-frequency words in Table 7, high-frequency words in theme 2 included "Happy New Year", "Military Parade", "Hope", "Fireworks", "Tiananmen Square", and other words, some of which expressed the best wishes of residents during the festival. The other part mainly described the representative symbols and constituent elements of festivals, especially National Day. The high-frequency words of theme 3, such as "delicious", "check in", and "taste" were associated with food and eating.  Combined with the differences in the spatial distribution of the theme of foreign festivals, it could be concluded that the main way residents participated in festivals had a certain relationship with the perfection of infrastructure. In the specific festival behaviors, residents living in the central city of Beijing can participate in various festival activities, so most of the content on Weibo reflects specific festival behaviors. Residents living in the suburbs of Beijing may have been restricted due to access to such infrastructure. Therefore, people expressed more wishes on the content of Weibo, with regard to the festival or the cultural concept of the festival itself.

Discussion
Most current researches on festivals and culture are conducted through surveys and field trips, and seldom uses big data to analyze related issues. Therefore, many scholars have realized the urgency of using social media data to carry out research on festival activities [4]. For example, Zhou's research mainly uses word frequency statistics and LDA  Combined with the differences in the spatial distribution of the theme of foreign festivals, it could be concluded that the main way residents participated in festivals had a certain relationship with the perfection of infrastructure. In the specific festival behaviors, residents living in the central city of Beijing can participate in various festival activities, so most of the content on Weibo reflects specific festival behaviors. Residents living in the suburbs of Beijing may have been restricted due to access to such infrastructure. Therefore, people expressed more wishes on the content of Weibo, with regard to the festival or the cultural concept of the festival itself.

Discussion
Most current researches on festivals and culture are conducted through surveys and field trips, and seldom uses big data to analyze related issues. Therefore, many scholars have realized the urgency of using social media data to carry out research on festival activities [4]. For example, Zhou's research mainly uses word frequency statistics and LDA theme models to identify residents' perceptions of traditional festivals and regional differences [18]. According to their research results, LDA topic classification is obviously a powerful method for analyzing social media data, text mining, and revealing the spatiotemporal characteristics of related activities. A study by Liu [41] studied the emotional characteristics of Chinese tourists to Australia based on big data text analysis and part-of-speech tagging. These methods all extend the textual analysis of festival activities. However, the above research lacks comprehensive mining of the rich semantic information and spatiotemporal information in social media data. Therefore, this research uses NLP technology to identify festival-related Weibo posts, and combines word frequency statistics, text labeling, LDA theme models, and GIS spatial analysis methods to analyze residents' perception characteristics of festivals and activities.
Judith Mair and Karin Weber [3] pointed out that many studies in the field of festival analysis had adopted a case study approach. Therefore, the research on special festivals is relatively sufficient, but the comprehensive comparative study of many festivals is lacking. This could be said to limit the scope and scale of our understanding of festival. Therefore, by expanding the scope of research on different types of festivals, we hope to improve the understanding the residents' perception of different festivals. Through the comparison of different types of festivals, the research found that Weibo texts reflect that residents pay more attention to different festivals. Traditional festivals still receive more widespread attention; from thematic analysis, it can be found that there are common characteristics between different types of festivals. For example, attention to leisure activities and food is very prominent; it is also universal to express greetings to family and friends through festivals. However, it can also be found that among different types of festivals, traditional festivals are more closely related to history and culture, while modern festivals are more closely related to leisure and consumption. Western festivals have been more connected with consumption and entertainment while retaining some religious imprints. Such a comprehensive study is of great significance for in-depth understanding of the connotations of festivals and social and economic development.
On the spatial scale, this study found an interesting phenomenon in the spatial pattern of residents' festival activities in a giant city. Although the gathering areas for different types of festivals are concentrated in densely populated urban centers, the activities of traditional festivals and modern festivals' distribution ranges are significantly larger than that of foreign festivals in the West. We believe that the way residents participate in festivals is related to the degree of infrastructure, especially the number of entertainment facilities such as catering and services. At the same time, the regional difference of festival activities within the city also proves the imbalance in the urban structure of Beijing, that is, the northern part of the city is more distributed than the southern area ( Figure 7) [42]. Just as Wilson [4] emphasized the important role of festivals to local communities. By increasing festival-related facilities in underdeveloped urban areas, it is also possible to promote the balanced development of the city. However, this research also poses a new challenge, that is, the difference between the east and the west of the city is also more obvious. This part of the reason needs to be studied in depth.
The results of this research show that we can understand the residents' perception of festivals by using social media big data. However, according to the 2020 Weibo User Development Report, Weibo users are predominantly people aged 20-30, and account for close to 80% of users [23]. Therefore, social media data is more of a relatively young group, and the data has problems with sample bias and representativeness. In order to solve this problem, in further research, traditional questionnaire surveys and other methods can be used to supplement the research samples by combining multiple sources of data to compensate for the problem of social media data sample deviation.

Conclusions
This study uses social media data to study residents' perceptions of festivals and the spatial characteristics of activities. By using a text classification model based on BERT and Transformers framework, we analyzed Weibo social media data related to festivals in Beijing in 2019. We obtained Beijing residents' perceptions of festivals and the ways they participated in festivals, and explored the spatial differences of residents' participation in festival activities.
Using word frequency statistics, part-of-speech analysis, and LDA topic model analysis, we analyzed Weibo social media data related to festivals in Beijing in 2019. We obtained Beijing residents' perceptions of festivals and the ways they participated in festivals, and explored the spatial differences of residents' participation in festival activities.
Traditional culture had a huge influence on festivals, which is not only reflected in residents' motivation to participate in festivals, but also in the ways they participated in festivals and the feelings they expressed. Traditional festivals occupied the central position of residents' perception of festivals. This was different from current concerns that traditional festivals are being greatly affected and impacted by foreign festivals. The feelings of family and motherland occupied a central position in modern festivals. This was clearly manifested in word frequency and topic spatial distribution. For traditional festivals, residents expressed their feelings through ancient poems from traditional Chinese culture. For example, for traditional festivals, frequently use words such as "On festive occasions more than ever one thinks of one's dear ones far away. (每逢佳节倍思亲)", "Will live long as he can! (但愿人长久)", and other verses were not used in relation to other types of festivals. The way residents participate in festivals is related to the degree of infrastructure, especially the number of entertainment facilities such as catering and services. Most of the Weibo posters from inner-city areas expressed specific festival-related behaviors showing that they were directly participating in the festival activities, while posters in the outer city area often expressed holiday wishes. Additionally, some of the residents' festival activities were related to religious beliefs, reflecting the cultural traditions and connotations behind the festivals in different types of festivals.
Through the analysis of the spatial distribution pattern of festival-related microblogs, it can be found that the temporal and spatial information of social media data can help understand the characteristics of urban spatial structure. Residents' festival activities are concentrated in densely populated and economically developed urban centers. The regional differences between the north and the south festival activities within the city are also in line with the characteristics of Beijing's urban spatial structure. However, this study found that the difference between the eastern and western parts of the city is also very obvious. This discovery presents a new challenge. The reasons for the differences between the east and west spaces of residents' activities need to be studied in depth.
This study uses social media data to study residents' perceptions of festivals and the spatial characteristics of activities. Combining natural language processing technology, statistical analysis, part-of-speech tagging, topic analysis, and spatial analysis, provides a new paradigm for the research in the field of festivals. However, the LDA topic model has certain shortcomings in processing sparse social media data. This requires subsequent advances in data processing technology. There is a problem of sample bias in social media data, which cannot reflect the situation of middle-aged and elderly people who use fewer social media well. In the follow-up research, traditional questionnaire survey methods can be used to supplement the samples with multi-source data. The spatial differences of residents' festival activities found in this study can only be described from a qualitative perspective at present. In the future, we hope that further studies can explain the reasons for the spatial differences from a quantitative perspective.

Data Availability Statement:
The data is available from the authors upon reasonable request.