Next Article in Journal
A GPR Imagery-Based Real-Time Algorithm for Tunnel Lining Void Identification Using Improved YOLOv8
Next Article in Special Issue
Harnessing Large Language Models for Digital Building Logbook Implementation
Previous Article in Journal
Experimental and Numerical Study on the Impact of Multi-Line TBM Tunneling in Fractured Zones on Building Deformation
Previous Article in Special Issue
A Novel Method for Named Entity Recognition in Long-Text Safety Accident Reports of Prefabricated Construction
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Topic Mining and Evolutionary Analysis of Urban Renewal Policy Texts in China

School of Urban Economics and Management, Beijing University of Civil Engineering and Architecture, Beijing 100044, China
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(18), 3324; https://doi.org/10.3390/buildings15183324
Submission received: 1 August 2025 / Revised: 8 September 2025 / Accepted: 12 September 2025 / Published: 14 September 2025
(This article belongs to the Special Issue Large-Scale AI Models Across the Construction Lifecycle)

Abstract

In the context of China’s rapid urbanization and the era of stock planning, urban renewal policies play a significant role in enhancing urban quality and promoting sustainable development. To reveal the thematic structure and evolution of China’s urban renewal policy system, this study applies the BERTopic model to conduct semantic mining and evolutionary analysis on 1144 policy documents issued by central and local governments. Research findings: The study identifies 34 distinct themes in urban renewal policies, grouped into five main directions: Spatial Improvement and Facility Upgrades, Project Collaboration and Approval, Land Acquisition and Compensation, Fiscal Incentives and Funding Support, and Institutional Guarantees and Governance. Each of these directions exhibits distinct evolutionary trends over time. While urban renewal policies in the Central, Western, Eastern, and Northeastern regions share common characteristics in key aspects such as land acquisition and compensation, funding assurance, and residential quality enhancement, they also reflect regional differences due to varying stages of development, economic conditions, and geographic factors. This demonstrates both the shared and distinct policy focus areas across different regions of China. By identifying underlying themes and their trajectories, this study provides critical insights into the structural characteristics of urban renewal policies and offers valuable references for government authorities to improve and optimize policy systems. At the same time, it provides the Chinese experience for urban renewal in other countries.

1. Introduction

According to United Nations projections, by 2050, 68% of the global population will reside in urban areas, with China’s urbanization process being particularly significant. It is estimated that China’s urban population will increase by 255 million by 2050 [1]. This trend imposes unprecedented pressures on urban development. As urbanization advances, cities face complex challenges such as resource inefficiency, environmental pollution, and aging infrastructure [2,3,4,5]. These issues not only hinder sustainable development but also directly affect the quality of life and well-being of urban residents [6]. To address these challenges, the United Nations’ New Urban Agenda and Sustainable Development Goal 11 advocate for building inclusive, safe, resilient, and sustainable cities and human settlements [7], highlighting the strategic importance of incorporating urban renewal into sustainable development agendas.
Urban renewal has been formally incorporated into China’s national core agenda and positioned as the strategic blueprint for urban development over the coming decades [8,9]. The 20th National Congress further emphasized accelerating the transformation of development models in megacities, promoting urban renewal initiatives, enhancing infrastructure construction, and building livable, resilient, and smart cities [10,11,12]. Urban renewal policies serve as institutional instruments for guiding resource allocation, regulating implementation pathways, and coordinating multiple stakeholders. The scientific and coordinated nature of these policy systems directly influences the effectiveness of urban renewal initiatives. Therefore, extracting meaningful themes, identifying inter-policy synergies, and analyzing evolutionary patterns from a vast and complex corpus of policy documents are essential for understanding the dynamics of policy development and improving policy effectiveness.
Recent academic studies on urban renewal policies have explored various dimensions. Ye et al. [13] examine the background factors driving policy innovation, highlighting the role of structural-instrumental, cultural-institutional, and environmental perspectives. Nachmany et al. [14] focus on the political economy and institutional dynamics influencing policy changes, and evaluate policy effectiveness through comparative studies such as those on Hong Kong and Macau [15]. In general, there are few studies on the quantitative content of urban renewal policy research, which is limited to the discussion of policy impact and effectiveness. As policy research increasingly moves toward quantitative and objective methodologies, topic modeling has emerged as a powerful tool for textual analysis. Among these, the BERTopic model has gained prominence for generating more coherent and interpretable topics. It has been applied in various domains, including government reports [16], social media analysis [17], policy evaluation [18], and general policy text mining [19]. Hence, BERTopic offers a promising approach for exploring semantic-level themes and policy orientations. This study analyzes urban renewal policy documents issued by central and local governments in China. By applying the BERTopic model at the sentence level, it aims to identify granular themes and uncover their temporal evolution. The goal is to depict the developmental characteristics and internal logic of urban renewal policies in China, thereby offering practical references for evidence-based policy formulation.

2. Literature Review

2.1. Current Research on Urban Renewal Policies

Urban renewal policy research has emerged as a critical subfield in urban planning and development. Pan et al. [20] conducted a systematic literature review and content analysis to establish an analytical framework encompassing strategic and instrumental dimensions. Their study reviewed major urban renewal policies in Shenzhen, revealing both the emphasized and neglected areas and providing policy recommendations. Min et al. [21] analyzed 427 policy documents in Shanghai, introducing a “supply–environment–demand–pattern–method” framework and applying dynamic topic modeling to explore thematic evolution. From a bibliometric perspective, Lyu et al. [22] employed data from the Web of Science and used statistical and co-occurrence analysis to identify cutting-edge themes and research trends in urban renewal. Hu et al. [23], drawing on the theory of policy entrepreneurship, investigated the mechanisms and processes of policy change in the context of China’s urban redevelopment.
In recent years, the application of natural language processing (NLP) technology in urban policy analysis has been continuously expanding. Chen K. et al. [24] used NLP to analyze public sentiment toward urban renewal in China, while Jiang L. et al. [25], using large-scale datasets, employed NLP technology to explore public attitudes toward urban village renovation projects, providing references for policy optimization. At the level of cross-regional policy coordination, Morandell T. et al. [26] used NLP to quantitatively compare the municipal spatial planning contents of 257 cities across 125 regions, offering guidance for policy coordination in local planning across Europe. Additionally, Dong X. et al. [27] built a low-carbon policy intensity index through NLP algorithms and text-based prompt learning, comprehensively quantifying China’s urban low-carbon policies. Hu W. et al. [28] applied NLP to categorize and index keywords in China’s smart city policies, combining traditional statistical methods and text mining techniques to assess the development of smart cities in China. These studies fully demonstrate the effectiveness of NLP techniques in processing complex urban policy texts and uncovering implicit semantic associations, thereby providing multidimensional methodological references for this research.

2.2. Application of Topic Models in Policy Text Analysis

Topic models use word frequencies to uncover hidden themes and have become key tools in policy research [29]. Traditional models such as Latent Dirichlet Allocation (LDA) and Non-negative Matrix Factorization (NMF) were bag-of-words-based models that focused on word frequencies while ignoring word order and contextual semantics. This resulted in limitations in theme consistency and interpretability [30], especially in the fast-evolving field of policy research [31]. LDA required manual selection of the number of topics and was sensitive to the initialization of parameters [32]. While NMF offered better interpretability, it depended heavily on initial values and represented words using linear methods, which reduced the stability and reproducibility of the results and could lead to suboptimal word embeddings [33,34]. Top2Vec, on the other hand, used a combination of document and word semantic embeddings to identify topics in vector space. It did not rely on bag-of-words methods and automatically determined the optimal number of topics [35], but the topics it identified tended to overlap and encompass multiple concepts [36].
To overcome these limitations, Grootendorst M. [37] proposed the BERTopic model in 2022. This model used the BERT language model to generate deep semantic vectors, which allowed for more coherent and interpretable topics. Compared to traditional frequency-based methods like LDA, the BERTopic model could distinguish topics through text clustering, significantly reducing manual labeling costs while achieving higher accuracy and comprehensiveness in capturing policy priorities [38]. BERTopic provided more accurate and context-rich topic representations than LDA, NMF, and Top2Vec, enhancing the interpretability and efficiency of news articles [39]. It was considered the preferred method for analyzing lengthy news documents [40], further illustrating how BERTopic could incorporate contextual semantic backgrounds in analyzing policy texts. Egger R. et al. [41] demonstrated that, compared to traditional topic modeling methods, BERTopic could encode contextual information, enabling it to capture semantic meanings and context-aware topic representations in textual data more effectively, thereby enhancing the semantic understanding of policies.
The BERTopic model has been applied across various policy domains. For instance, Sheng et al. [30] integrated BERTopic, social network analysis, and the STIRPAT model to analyze regional policy coordination and carbon reduction outcomes. In finance, Li et al. [18] used BERTopic to cluster 3439 academic articles and 181 central-level policy texts on supply chain finance, further evaluating policy coherence using the PMC-Index model. In infrastructure studies, Hong et al. [42] employed BERTopic to identify themes and trends in infrastructure policy and proposed a strategic policy roadmap.

2.3. Research Commentary

In summary, while significant progress has been made in analyzing urban renewal policies, there remains room for methodological and thematic enhancement. Most existing studies rely on traditional quantitative approaches and focus on macro-level summaries, lacking deep semantic exploration of policy texts. The precision of theme granularity and systematic characterization of inter-topic relationships are areas requiring further development. Although topic modeling has gained traction in policy text analysis, applications in urban renewal policy remain sparse. Compared to conventional models like LDA, BERTopic offers notable advantages in semantic mining and topic coherence. This study innovatively incorporates BERTopic into urban renewal policy analysis, leveraging its advanced features to achieve more accurate topic identification and systematic thematic interpretation.

3. Materials and Methods

3.1. Overall Research Framework

This study performs topic identification and evolutionary analysis on urban renewal policy texts in China. The research steps for this study are shown in Figure 1.
(1) Data collection and cleaning: Urban renewal policy documents issued by central and local (provincial and municipal) governments in China were collected using keyword searches such as urban renewal, old residential community renovation, urban village transformation, shantytown redevelopment, and dilapidated housing reconstruction. A Python-based crawler was developed to download the relevant policy texts. To improve model performance, the collected policy documents were segmented into individual sentences. The Jieba library was employed to conduct Chinese word segmentation, remove stopwords, and clean invalid or redundant content. This process transformed long, unstructured policy texts into a structured corpus of short texts suitable for model input. (2) Topic modeling: The BERTopic model was trained on the preprocessed short-text corpus. Topics were extracted from the urban renewal policy sentences, and visualizations were generated using dimensionality reduction algorithms. Identified topics were clustered according to semantic similarity, and each cluster was interpreted through qualitative content analysis to reveal thematic orientation. (3) Multi-perspective analysis: This analysis included theme identification and cluster analysis on national urban renewal policies, used theme intensity for statistics and analysis, and aimed to reveal trends in policy priorities over time. Meanwhile, on the basis of comparative analysis, it summarized the commonalities and differences in theme setting, content coverage, and other aspects of urban renewal policies across China’s four major economic regions.

3.2. Data Source and Selection

The policy texts analyzed in this study were obtained from the PKULaw database [43,44] (https://www.pkulaw.com/)(accessed on 15 June 2025). Our data collection process complied with relevant Chinese regulations and the terms of service of the PKULaw website. The search covered the period from 1 January 2000 to 31 May 2025, using the keywords urban renewal, old residential community renovation, urban village transformation, shantytown redevelopment, and dilapidated housing reconstruction to retrieve relevant policy titles. A total of 1568 policy documents were retrieved, including 83 issued by central government agencies and 1485 by local governments. The following inclusion criteria were applied for data screening: (1) the document must be a current or soon-to-be-effective policy issued by the central or local government; (2) the document must be an official policy type, including laws, regulations, provisions, opinions, notices, and methods—excluding informal decisions such as approvals, responses, or public announcements; (3) the policy content must be directly related to urban renewal. After screening, 1144 documents were retained as the final dataset for analysis (see Supplementary Materials). Notably, although China’s urban renewal practice began earlier, policies issued before 2000 were sparse and scattered. Therefore, this study focuses on policies issued from 2000 onward, which more systematically represent urban renewal efforts in China.

3.3. Text Preprocessing

Before performing BERTopic topic modeling, we conducted text preprocessing on the raw data to ensure the quality and relevance of the dataset, thereby improving data integrity and reducing computational overhead. This study followed the data preprocessing methods proposed by Jelodar H. [32] and Su Y. S. [33], and the text cleaning and optimization were completed through the following four steps:
(1)
Removing duplicate texts: We manually reviewed the unstructured text data to identify and remove duplicate articles, ensuring the uniqueness of each text in the dataset. This step helped avoid bias in the analysis results caused by duplicate data.
(2)
Removing irrelevant texts: By combining manual review with keyword filtering, we excluded texts that were clearly unrelated to the theme of “urban renewal”. This step focused the research scope, enhancing the accuracy and interpretability of subsequent topic modeling.
(3)
Cleaning unrelated elements: We systematically removed HTML tags, URL links, special symbols, and pure numeric characters from the text. These elements typically did not carry semantic meaning, and their removal helped reduce noise and interference in the natural language processing process.
(4)
Tokenization and lemmatization: We used the widely adopted Chinese text segmentation tool Jieba to tokenize the cleaned text, breaking continuous text into the smallest units with independent meanings. To accommodate the specialized terminology in the urban renewal field, we incorporated a custom domain-specific dictionary (e.g., “old residential areas,” “urban health checks”, “age-friendly renovation”, “micro-renovation”) during tokenization to prevent incorrect splitting of professional terms. Additionally, a stopword list was constructed by including stopwords from the Harbin Institute of Technology’s list, specific terms such as names of people and places, special characters like “@”, “#”, and emojis, and custom stopwords (e.g., “second phase”, “three years”, “a batch”, “mentioned above”) to eliminate irrelevant words.
It is important to note that during tokenization, we fully considered the characteristics of specialized terminology in the urban renewal field. If the tokenization results were inaccurate (e.g., failing to correctly identify compound terms like “old residential areas” or “micro-renovation”), it could lead to deviations in subsequent topic feature extraction. Core concepts might be fragmented into meaningless word fragments, reducing the significance of key terms and negatively affecting the quality and interpretability of the topic model. To address this, we imported the custom domain-specific dictionary and conducted iterative adjustments to maximize the accuracy of tokenization, ensuring the reliability of the analysis results.

3.4. BERTopic Modeling

In recent years, topic models have been widely applied in policy text analysis due to their ability to identify the underlying semantic structures and latent topics in texts [45]. Considering that the policy document collection analyzed in this study was relatively long, when encoding long documents with multiple topics or converting them into embeddings, some information might be lost in the input data [46]. Jin J. et al. [38], through sentence-level processing of government work reports, found that using BERTopic allowed for more precise handling of the semantic complexity and ambiguity of texts. Li Q. et al. [47] segmented news texts into sentence-level units, which exhibited better characteristics in terms of topic consistency and coherence. Therefore, this study adopts sentence-level analysis. Each sentence is treated as an independent unit, and sentences are clustered based on their semantic and contextual features. This approach enables a clearer understanding of topics with contextual characteristics, presenting the multidimensional nature of the documents and improving the accuracy of topic clustering.
This study employs BERTopic [37], a deep learning-based topic modeling technique, to identify the latent topics embedded in China’s central and local urban renewal policy documents. BERTopic uses pre-trained BERT embeddings to capture rich semantic representations of text, making it particularly effective for context-sensitive topic modeling. The existing research using BERTopic for topic modeling generally follows these key steps [48]:
1. Embedding: This step involves converting natural language into a format that can be effectively processed by computers. The algorithm uses BERT or Sentence Transformers along with pre-trained language models to generate document embeddings from a set of documents. To extract sentence representations, we use the “bert-base-chinese” [49] sentence transformer as the default choice within the BERTopic framework. This model is specifically designed for Chinese tasks and optimized for semantic similarity, converting documents into numerical representations to create embeddings that capture the textual semantics.
2. Dimensionality reduction: The high-dimensional embeddings are reduced using UMAP (Uniform Manifold Approximation and Projection) to preserve semantic relationships while minimizing computational complexity [50]. UMAP is well-suited for maintaining the global and local structure of the data during projection.
3. Clustering: The HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise) algorithm clusters the reduced embeddings based on density and semantic similarity. HDBSCAN excels at detecting clusters of arbitrary shape and handling noise, assigning documents to core or peripheral topic clusters accordingly [51].
4. Topic representation: To extract representative keywords for each topic, the class-based TF-IDF (c-TF-IDF) algorithm is applied, which calculates the term importance within each cluster. Terms with the highest c-TF-IDF scores are considered core descriptors of the topic [52]. The c-TF-IDF formula is defined as follows:
W x , c = t f x , c × l o g ( 1 + A f x )
where t f x , c is the frequency of word x in class c , f x is the frequency of word x across all classes, and A is the average number of words per class.
After obtaining the latent topics of the texts using BERTopic, we calculated the topic intensity to represent the relative weight of this topic in the entire policy text. The specific calculation formula is as follows:
P k = i N θ k i N
where P k denotes the intensity of Topic k, N represents the total number of sentences, and θ k i indicates the probability value of Topic k in the i-th sentence.

3.5. BERTopic Parameter Configuration and Validation

In this study, the bert-base-chinese pre-trained sentence transformer was used for text embedding within the BERTopic model, with UMAP employed for dimensionality reduction. The parameters were set as follows: n_neighbors = 120, min_dist = 0.0, n_components = 5, and metric = “cosine”. The HDBSCAN clustering configuration was set to min_cluster_size = 180 and metric = “euclidean”. The number of topics was not pre-specified manually but was automatically inferred by the algorithm.
Topic coherence score was the core quantitative metric for evaluating the quality of the topic model. It measured the semantic association strength among high-frequency terms within a topic, reflecting human perception of structured topics. In other words, it evaluated the interrelationship between the top k words in a given topic. Higher coherence scores indicated that the topic was cohesive, clear, and relevant, while lower scores suggested a lack of clarity, presence of noise, or irrelevance [53]. To validate the necessity of sentence-level segmentation, we additionally tested sentence-level and full-text-level granularities. The experiments showed that the full-text model had a CV (cross-validation) coherence score of 0.389, whereas the sentence-level model used in this study achieved a CV coherence score of 0.656, indicating that sentence-level segmentation provided reasonable coherence.
To evaluate the accuracy and separability of document clustering, the Davies–Bouldin Index (DBI) was used, a well-established method for internal validation. This index identified the least favorable group pairs by utilizing within-group variance and between-group centroid distances [54]. The DBI for this model was 0.313, indicating good separation between topic clusters.
To ensure the accuracy and reliability of the machine learning results, we conducted a thorough manual review of the generated topics and their related policies. A team of three experts carefully assessed the coherence and relevance of each topic, reviewing the keywords and policies. To ensure reliability among the evaluators, we applied Fleiss’ kappa, a metric used to evaluate the consistency of three or more raters on categorical variables [55]. Our analysis yielded a kappa value of 0.63 for the 34 topics, indicating a substantial level of agreement among the raters, ensuring that the unsupervised model’s output was meaningful and insightful. These metrics collectively supported the coherence, robustness, and semantic validity of the topic structure derived through BERTopic.

4. Results and Discussion

4.1. Overall Identification of Policy Themes

Figure 2 illustrates the declining trend in keyword importance weights within each topic. It shows that for most topics, a few high-ranking keywords carried the greatest representational weight, while the importance of other keywords declined rapidly. This suggests that each topic could be effectively represented by approximately three to five core keywords, with additional keywords providing only marginal improvements in interpretability.
We conducted topic identification on Chinese urban renewal policy texts using the BERTopic model. After multiple experimental iterations, we ultimately identified 34 topics (labeled Topic 0 to Topic 33). Each identified topic was represented by a set of characteristic terms with different weights, which collectively illustrated the main content of the topic, as detailed in Table 1.
From the perspective of policy focus areas, this is mainly reflected in the following aspects: (1) Infrastructure and renovation of old residential areas: From Topics 0, 1, 14, and 26, it is evident that the policy focuses on addressing the core needs of residents’ daily lives. This is achieved by upgrading community facilities, promoting the renovation of old residential areas and urban villages, and incorporating infrastructure construction in resettlement projects. The goal is to improve living conditions, enhance urban functions, and ensure the effective implementation of renovation tasks. (2) Land acquisition and resettlement compensation: Topics 4, 9, 22, and 27 clearly indicate that the policy aims to balance urban development with the rights of residents. By standardizing the land acquisition process and clarifying compensation standards, the policy ensures that the land acquisition process is lawful and orderly. (3) Financial support and diverse fundraising: From Topics 3, 20, and 32, it can be seen that the policy broadens funding sources through financial subsidies, tax incentives, and loan support. It encourages social capital participation in urban renewal projects. (4) Policy norms and execution assurance: Topics 6, 8, 15, and 25 emphasize that the policy stresses the importance of defining urban renewal plans and policies through processes like issuance, formulation, review, and approval. This ensures that all tasks are carried out in a structured manner. Additionally, from Topics 10, 21, and 33, it is evident that the policy strengthens the accountability mechanism by using legal means such as enforcing violations, clarifying civil liabilities, introducing compulsory enforcement by courts, and administrative litigation, ensuring compliance in the execution of the policy.

4.2. Overall Identification of Policy Directions

To further understand the relationships among identified topics, the study conducted visual analyses using a topic distance map and a cosine similarity heatmap. These tools helped to uncover the latent structure of urban renewal policy discourse. Figure 3 presents a 2D topic distance map alongside a cosine similarity heatmap; Figure 4 shows the topic hierarchical clustering diagram. The topic distance map applied dimensionality reduction to project the 34 topics onto a two-dimensional plane. In this map, the size of each circle indicated the relative importance of the topic, and the spatial proximity between circles reflected semantic similarity. The cosine similarity heatmap revealed pairwise semantic correlations among the 34 topics. Darker shades of blue indicated stronger similarity (values closer to 1), while lighter shades represented weaker associations (values closer to 0). The heatmap confirmed the internal coherence of certain topic clusters and the divergence among others. These findings reinforced the reliability of the BERTopic clustering results and validated the natural grouping of topics. Together, these visual analytics revealed the semantic structure underlying China’s urban renewal policy discourse. They demonstrated the systematic and hierarchical nature of policy formulation, offering insights into how thematic priorities evolved and interconnected within the broader urban governance framework.
Based on the clustering results, evaluation metrics, and expert judgment, this study categorizes the 34 topics into five overarching dimensions: Spatial Improvement and Facility Upgrades, Project Collaboration and Approval, Land Acquisition and Compensation, Fiscal Incentives and Funding Support, and Institutional Guarantees and Long-term Governance. Table 2 presents the number and proportion of relevant sentences for the 34 identified topics. From the data, it is clear that the themes of urban renewal policy texts show distinct areas of focus. Spatial Improvement and Facility Upgrades: With a proportion of 35.48%, this theme indicates that optimizing physical space layout and improving infrastructure and public services are key tasks in the urban renewal process. These factors are directly related to enhancing urban functions and improving residents’ quality of life, representing a critical aspect of the “hard power” enhancement in urban renewal. Project Collaboration and Approval: At 30.08%, this theme highlights the complexity of urban renewal as a systemic project that heavily relies on coordination among multiple stakeholders and efficient approval processes. The policy emphasizes breaking down departmental barriers, integrating resources from enterprises, governments, and communities, simplifying approval procedures, and improving efficiency. This approach aims to remove institutional obstacles and ensure the smooth implementation of urban renewal projects. Land Acquisition and Resettlement Compensation: This theme accounts for 15.56%, reflecting the focus on safeguarding people’s livelihood rights during urban renewal. While advancing urban space restructuring and functional upgrades, it is essential to handle the coordination of interests in the land requisition process. Through reasonable compensation and resettlement policies, the rights of those affected by land acquisition are protected, ensuring the social stability of the renewal process. Financial and Tax Incentives and Funding Support: With a proportion of 13.51%, this theme highlights the policy’s focus on securing financial support for urban renewal. Given that urban renewal projects often involve large investments and long payback periods, relying solely on a single funding source is insufficient. The policy addresses this by establishing a diversified financial and tax incentive mechanism, such as subsidies, tax benefits, and special funds, to attract social capital, providing continuous financial support for the projects and promoting the sustainable development of urban renewal. Institutional Guarantees and Long-term Governance: With a relatively low proportion of 5.37%, this theme suggests that current policies focus more on specific implementation aspects, while there is still room for improvement in terms of attention to institution building and sustainable governance mechanisms.

4.3. Overall Analysis of the Evolutionary Trends of Policy Themes

To further explore the evolution of the thematic intensity of urban renewal policies in China, this study conducted an evolutionary analysis of the topics generated by the BERTopic model. The data were sliced by year to show the evolution of five key policy themes in urban renewal over the past 25 years. This allowed for a clearer understanding of the changing trends in thematic intensity and government focus. The topic strength distribution of urban renewal policies over the years is shown in Table 3, and the trends were analyzed through the thematic intensity trend line (Figure 5).
1. Project Collaboration and Approval: This aligns with the broader shift in China’s urban governance from government-led approaches to multi-stakeholder collaboration. Early high-intensity topics focus on vertical management processes such as planning review (Topic 15), reflecting the government-dominated project-based system. For example, Topic 8 showed a significant change in 2019 based on Pettitt’s change-point detection. This timing coincides with the release of the Central Committee’s decision on modernizing China’s system and governance capacity, which emphasizes “building a well-structured, scientifically sound, and effective institutional system.” This reflects increased attention to institutional design and multi-actor participation in urban–rural development, marking a transition toward more collaborative, institutionalized, and procedural governance.
2. Land Acquisition and Resettlement Compensation: This corresponds to the strategic shift in Chinese urban governance from large-scale new development to improved stock management and finer governance. In the early 2000s, rising emphasis on topics like housing expropriation and compensation (Topics 4 and 9) addressed demands during rapid urbanization. The 2011 Regulation on the Expropriation and Compensation of Houses on State-owned Land emphasizes public interest and protection of residents’ rights. A change point detected in Topic 9 in 2020 aligns with the Ministry of Housing and Urban–Rural Development’s notice against excessive demolition and construction. This national policy stresses controlled expropriation, renovation, and quality of life. The decline in Topic 9’s intensity indicates a move away from large-scale land redevelopment toward more regulated, rights-protecting, and refined governance.
3. Spatial Improvement and Facility Upgrades: This reflects China’s people-centered and sustainability-oriented development strategy. After 2010, under the “New Urbanization” initiative, policy attention shifted toward improving human living environments. The growing emphasis on facilities enhancement (Topic 0) and old residential neighborhoods (Topic 26) reflects this priority. From 2016 to 2020, the focus on old neighborhood renovation intensified, reinforced by national guidelines issued in 2020. This phase highlights organic regeneration of existing urban areas, social integration, public participation, and green development, demonstrating more refined and human-centered urban governance.
4. Fiscal Incentives and Funding Support: This reveals the evolution of financing mechanisms in China’s urban renewal. Around 2014, central government interventions such as monetary compensation in shantytown redevelopment and policy bank loans led to increased attention on lending (Topic 3). After 2020, policy encouraged private investment (Topic 5), reflecting a shift toward market-driven and sustainable renewal models. This trend aligns with broader market reforms and modernized governance, though challenges remain in balancing policy incentives with fiscal sustainability.
5. Institutional Safeguards and Long-term Governance: This theme reflects the deep transformation from “project-driven” to “institutional empowerment”. In 2013, the Decision of the CPC Central Committee on Several Major Issues Concerning Comprehensively Deepening Reform first put forward the “modernization of national governance”. Under the strategy of comprehensively advancing the rule of law, the topic of project contracts (Topic 24) gained more attention, and the topic of civil liability (Topic 21) was further strengthened. This shows that policies focus on clarifying rights and responsibilities via standardized contracts and ensuring implementation quality through professional qualifications. The Pettitt change-point detection method shows that Topic 30 had a change point in 2020. This matches the higher demand for project quality, liability tracing, and law-based supervision after urban renewal entered the large-scale implementation stage. This indicates that national governance is shifting from “project-oriented thinking” to “institute-oriented thinking”. The goal is to reduce transaction risks and improve governance efficiency by building a stable legal and contractual framework. However, overall attention to institutional guarantees and long-term governance remains insufficient. It is urgent to address this gap through more breakthrough policy refinement and resource allocation.

4.4. Comparative Analysis of Policies in Different Regions

To gain deeper insights into the policy focus and strategic differentiation under different development conditions, and to accurately capture the government’s attention on urban renewal, this study categorizes the four major economic regions of China based on the policies released by the State Council, as shown in Table 4.
Using the BERTopic model, this study identifies and analyzes the policy themes of urban renewal for the four major economic regions, with the results displayed in Figure 6. These results clearly show both similarities and distinctive regional characteristics in the thematic structure of urban renewal policies across the four regions.
Through a comparative analysis of the urban renewal policy themes across the four regions, the following common features emerge: (1) Land Acquisition and Resettlement Compensation: All four regions have clear policy themes surrounding land acquisition and resettlement compensation, reflecting the collaborative nature of the policies. In the Eastern region, topics like Topic 3 and Topic 4 emphasize land requisition and resettlement, while in the Western region, Topic 5 and Topic 6 focus on compensation and relocation. In the Central region, Topic 0 and Topic 8 highlight similar themes, and in the Northeastern region, Topic 4 and Topic 7 also reflect the land acquisition and resettlement paradigm. This indicates that all regions focus on standardizing the requisition process to ensure smooth integration of resources and protect the rights of affected groups. (2) Funding Support and Financing Innovation: Each region proposes various financial and fiscal support mechanisms to address the funding needs of urban renewal projects. In the Eastern region, Topic 5 and Topic 7 involve resettlement investment and subsidies, while in the Western region, Topic 7 and Topic 17 mention special loans and bond financing. The Central region’s Topic 11 and Topic 17 emphasize special funds and evaluation mechanisms, while the Northeastern region’s Topic 3 addresses monetization and financial arrangements. Despite the varying economic strengths across the regions, all have been actively exploring diverse funding paths and improving funding assurance mechanisms. (3) Facility Improvement and Quality Upgrades: This reflects the high priority placed on improving living conditions and responding to public demands. In the Eastern region, Topic 6 and Topic 11 emphasize functional optimization and upgrades to infrastructure like fire safety; in the Western region, Topic 13 stresses improving housing conditions and enhancing the quality of housing. In the Central region, Topic 12 focuses on the gas facilities in old residential areas, which are a critical public service need. In the Northeastern region, Topic 5 shows a clear policy focus on quality upgrades, with a clear goal.
In terms of specific policy orientations, the economic regions show distinct characteristics: (1) Eastern Region: The focus is on fine-grained urban renewal management, market-based operations, and policy standardization. For example, Topic 0 reflects a refined approach to optimizing existing space, while Topic 14 emphasizes the market-oriented allocation of land resources. Topic 16 underscores the importance of policy rigor and standardization. (2) Western Region: The focus is on urban–rural integration and financial support, emphasizing both urban and rural spatial renewal and strengthening funding mechanisms. Topic 0 highlights the region’s emphasis on integrated rural–urban renewal, and Topics 7, 15, and 17 form a diverse financing system with special loans, purchasing in lieu of construction, and expanded social capital participation to ensure the smooth implementation of urban renewal projects. (3) Central Region: The focus is on public services and livelihood security. Topics 12 and 14 reflect the attention to residents’ daily needs, especially the renewal of aging infrastructure like gas facilities. Topics such as Topic 2 and Topic 15 reflect broader efforts to ensure improved urban quality and better public service delivery, enhancing the public’s sense of gain and well-being. (4) Northeastern Region: The focus is on the management of industrial heritage and subsiding areas. Topics 1 and 8 reflect the region’s efforts to address the issues of old industrial bases, including ecological restoration and exploring transformation paths through industrial heritage and cultural tourism integration.
The regional differences in urban renewal policies arise mainly from the following factors: (1) Development Stage Differences: The Eastern region has entered the post-industrialization stage, with limited space for construction land expansion; the Central and Western regions are in the phase of accelerated urbanization; the Northeastern region, as an old industrial base, faces rapid industrial decline and needs to address historical issues to create conditions for transformation and upgrading. (2) Economic Foundation and Fiscal Capacity: The Eastern region has a developed economy, strong local government finances, and abundant social capital; the Western region has a relatively weaker economic foundation, limited fiscal capacity, and cannot rely solely on its own resources to push forward large-scale urban renewal; the Central region’s fiscal capacity is between the Eastern and Western regions, and its urban renewal has significant social benefits; the Northeastern region faces downward economic pressure and weak fiscal support. (3) Geographical Conditions: The Eastern region benefits from a long coastline, superior ports, and geographic advantages that facilitate attracting global capital and talent. The Western region has a vast area but fragile ecology, rich in energy and mineral resources, and unevenly distributed. Urban renewal in this region needs to consider the ecological capacity. The Central region, located inland, is a key transportation hub and population concentration area, focusing on ensuring basic livelihood and facilitating the smooth flow of people and goods. The Northeastern region has rich ecological resources, while also having a concentration of traditional energy and heavy industries.

5. Conclusions and Recommendations

This study analyzes 1144 urban renewal policy documents issued in China between 2000 and 2025 using the BERTopic model. It reveals their thematic evolution and regional variations, providing a basis for optimizing policy design.
Key Findings: (1) Policy theme structure: Through BERTopic model-based recognition and clustering, 34 sub-themes of urban renewal policy are extracted. These are further categorized into five core directions. This classification highlights both the systematic and hierarchical nature of urban renewal policies. (2) Theme evolutionary trends: China’s urban renewal governance paradigm has shifted from a government-led, incremental expansion model focused on engineering projects to a more collaborative, stock-optimization, and institutional empowerment approach. (3) Regional policy differences: The policies of different regions show both commonalities and distinct regional characteristics due to differences in development stages, economic conditions, and geographic contexts. The Eastern region emphasizes fine management and market operations, while the Western region stresses urban–rural coordination and diversified financing models. The Central region focuses on public services and livelihood improvement, and the Northeastern region places more attention on industrial heritage and ecological governance.
Policy implications and recommendations: (1) Strengthen multi-department collaborative governance. Led by housing authorities, coordinate departments like natural resources and finance to create policy synergy. Streamline approvals, build a digital platform with performance metrics (e.g., utilization and online processing rates), and integrate project tracking for transparent, cost-effective processes. (2) Enhance funding and incentives. Establish a dedicated urban renewal fund, combining central and local finances to prioritize public welfare projects. Use interest subsidies and guarantees to attract social capital for renovation and industrial development. (3) Tailor policies to regional disparities. Eastern regions should focus on refined management and smart, low-carbon renewal; central regions on old community upgrades and public services; western regions on urban–rural integration and ecological/cultural preservation; northeast regions on revitalizing industrial heritage and managing subsidence areas for urban transformation.
Research contributions: (1) Moving beyond traditional manual coding or word-frequency methods, this study introduces the BERTopic model to urban renewal policy analysis, enabling sentence-level semantic mining and evolutionary tracking. This approach significantly improves the accuracy and depth of policy theme identification and offers a new technical pathway for quantitative policy text research. (2) A comparative analysis of policy texts across four major economic regions reveals both commonalities and differentiated features in policy orientation, providing a scientific basis for regionally coordinated and tailored policy design by central and local governments.
Research limitations: (1) This study does not deeply explore the spatial heterogeneity of urban renewal policies across different regions. (2) It has not fully verified the causal relationship between regional economic/fiscal differences and policy discourse.
Research prospects: Future research can integrate Geographic Information System (GIS) technology or spatial econometric methods to further reveal the differentiated characteristics of policy implementation in different regions and their spatial spillover effects. Meanwhile, it can use causal inference methods such as panel data analysis to conduct in-depth analysis of the shaping effect of economic foundations on policy orientations, thereby providing a more scientific basis for optimizing regional coordinated development strategies.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/buildings15183324/s1.

Author Contributions

Conceptualization, G.Z.; data curation, X.L.; methodology, X.L. and Q.L.; supervision, Q.L.; writing—original draft, X.L.; writing—review and editing, G.Z. and Q.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. UN 2018 Revision of World Urbanization Prospects. 2018. Available online: https://www.un.org/zh/desa/2018-revision-world-urbanization-prospects (accessed on 12 September 2022).
  2. Goldstone, J.A. The new population bomb: The four megatrends that will change the world. Foreign Aff. 2010, 89, 31. [Google Scholar]
  3. Wu, G.; Miao, Z.; Shao, S.; Jiang, K.; Geng, Y.; Li, D.; Liu, H. Evaluating the construction efficiencies of urban wastewater transportation and treatment capacity: Evidence from 70 megacities in China. Resour. Conserv. Recycl. 2018, 128, 373–381. [Google Scholar] [CrossRef]
  4. Wang, Y.; Li, X.; Kang, Y.; Chen, W.; Zhao, M.; Li, W. Analyzing the impact of urbanization quality on CO2 emissions: What can geographically weighted regression tell us? Renew. Sustain. Energy Rev. 2019, 104, 127–136. [Google Scholar] [CrossRef]
  5. Gerland, P.; Raftery, A.E.; Ševčíková, H.; Li, N.; Gu, D.; Spoorenberg, T.; Alkema, L.; Fosdick, B.K.; Chunn, J.; Lalic, N.; et al. World population stabilization unlikely this century. Science 2014, 346, 234–237. [Google Scholar] [CrossRef]
  6. Liu, G.; Yi, Z.; Zhang, X.; Shrestha, A.; Martek, I.; Wei, L. An evaluation of urban renewal policies of Shenzhen, China. Sustainability 2017, 9, 1001. [Google Scholar] [CrossRef]
  7. Zoomers, A.; Van Noorloos, F.; Otsuki, K.; Steel, G.; Van Westen, G. The rush for land in an urbanizing world: From land grabbing toward developing safe, resilient, and sustainable cities and landscapes. World Dev. 2017, 92, 242–252. [Google Scholar] [CrossRef]
  8. Li, J.; Burgess, G.; Sielker, F. Political mobilisation and institutional layering in urban regeneration: Transformation of land redevelopment governance in China. Cities 2023, 141, 104508. [Google Scholar] [CrossRef]
  9. Li, X.; Tang, J.; Huang, J. Place-based policy upgrading, business environment, and urban innovation: Evidence from high-tech zones in China. Int. Rev. Financ. Anal. 2023, 86, 102545. [Google Scholar] [CrossRef]
  10. Zheng, H.W.; Shen, G.Q.; Wang, H. A review of recent studies on sustainable urban renewal. Habitat Int. 2014, 41, 272–279. [Google Scholar] [CrossRef]
  11. Liu, Z.; Liu, S. Urban shrinkage in a developing context: Rethinking China’s present and future trends. Sustain. Cities Soc. 2022, 80, 103779. [Google Scholar] [CrossRef]
  12. Azhdari, A.; Sasani, M.A.; Soltani, A. Exploring the relationship between spatial driving forces of urban expansion and socioeconomic segregation: The case of Shiraz. Habitat Int. 2018, 81, 33–44. [Google Scholar] [CrossRef]
  13. Ye, L.; Peng, X.; Aniche, L.Q.; Scholten, P.H.; Ensenado, E.M. Urban renewal as policy innovation in China: From growth stimulation to sustainable development. Public Adm. Dev. 2021, 41, 23–33. [Google Scholar] [CrossRef]
  14. Nachmany, H.; Hananel, R. The Fourth Generation: Urban renewal policies in the service of private developers. Habitat Int. 2022, 125, 102580. [Google Scholar] [CrossRef]
  15. Chan, K.S.; Siu, Y.F.P. Urban governance and social sustainability: Effects of urban renewal policies in Hong Kong and Macao. Asian Educ. Dev. Stud. 2015, 4, 330–342. [Google Scholar] [CrossRef]
  16. Li, Y.; Zhang, C.; Zhao, L. Research on the Characteristics and Evolution of Digital Economic Policy Topics Based on BERTopic Model. In Proceedings of the 2024 2nd International Conference on Digital Economy and Management Science (CDEMS 2024), Wuhan, China, 26–28 April 2024; Atlantis Press: Dordrecht, The Netherlands; pp. 38–53. [Google Scholar]
  17. Ocal, A. Perceptions of the Future of AI on Social Media: A Topic Modeling and Sentiment Analysis Approach. IEEE Access 2024, 12, 182386–182409. [Google Scholar] [CrossRef]
  18. Li, M.; Dong, Y. Research on the evaluation of China’s Supply Chain Finance policy based on text mining. PLoS ONE 2025, 20, e0317743. [Google Scholar] [CrossRef] [PubMed]
  19. Banerjee, S.; Pan, A. From colonial legacies to linguistic inclusion: A BERTopic enhanced bibliometric insight into global south higher education. IEEE Access 2024, 12, 117418–117435. [Google Scholar] [CrossRef]
  20. Pan, W.; Du, J. Towards sustainable urban transition: A critical review of strategies and policies of urban village renewal in Shenzhen, China. Land Use Policy 2021, 111, 105744. [Google Scholar] [CrossRef]
  21. Min, X.; Shi, N.; Duan, K. Diachronic Analysis of Urban Regeneration Policies in Shanghai: Instruments, Modes, and Methods. J. Urban Plan. Dev. 2025, 151, 04025002. [Google Scholar] [CrossRef]
  22. Lyu, P.H.; Zhang, M.Z.; Wang, T.R.; Zhang, X.F.; Ye, C.D. Research Trends, Knowledge Base, and Hotspot Evolution of Urban Renewal: A Bibliometric Approach. J. Urban Plan. Dev. 2023, 149, 04023033. [Google Scholar] [CrossRef]
  23. Hu, F.Z.; Lin, G.C.; Yeh, A.G.; He, S.; Liu, X. Reluctant policy innovation through profit concession and informality tolerance: A strategic relational view of policy entrepreneurship in China’s urban redevelopment. Public Adm. Dev. 2020, 40, 65–75. [Google Scholar] [CrossRef]
  24. Chen, K.; Wei, G. Public sentiment analysis on urban regeneration: A massive data study based on sentiment knowledge enhanced pre-training and latent Dirichlet allocation. PLoS ONE 2023, 18, e0285175. [Google Scholar] [CrossRef]
  25. Jiang, L.; Lai, Y.; Wu, Y.; Wang, R.; Tang, X.; Li, X.; Ma, D.; Guo, R. Public attitudes toward state-led urban village rehabilitation in Shenzhen, China. Habitat Int. 2025, 160, 103404. [Google Scholar] [CrossRef]
  26. Morandell, T.; Wicki, M.; Kaufmann, D. The planning of urban–rural linkages: An automated content analysis of spatial plans adopted by European intermediate cities. Landsc. Urban Plan. 2025, 255, 105258. [Google Scholar] [CrossRef]
  27. Dong, X.; Wang, C.; Zhang, F.; Zhang, H.; Xia, C. China’s low-carbon policy intensity dataset from national-to prefecture-level over 2007–2022. Sci. Data 2024, 11, 213. [Google Scholar] [CrossRef]
  28. Hu, W.; Wang, S.; Zhai, W. Human-centric vs. technology-centric approaches in a top-down smart city development regime: Evidence from 341 Chinese cities. Cities 2023, 137, 104271. [Google Scholar] [CrossRef]
  29. Sheng, S.; Li, Y.; Zhao, Z. How does regional policy coordination help achieve the low-carbon development?: A study of theoretical mechanisms and empirical analysis from China. Environ. Dev. Sustain. 2024, 1–33. [Google Scholar] [CrossRef]
  30. Du, B.X.; Liu, G.Y. Topic analysis in lda based on keywords selection. J. Comput. 2021, 32, 1–12. [Google Scholar]
  31. Liu, Y.; Wan, F. Unveiling temporal and spatial research trends in precision agriculture: A BERTopic text mining approach. Heliyon 2024, 10, e36808. [Google Scholar] [CrossRef] [PubMed]
  32. Jelodar, H.; Wang, Y.; Yuan, C.; Feng, X.; Jiang, X.; Li, Y.; Zhao, L. Latent Dirichlet allocation (LDA) and topic modeling: Models, applications, a survey. Multimed. Tools Appl. 2019, 78, 15169–15211. [Google Scholar] [CrossRef]
  33. Su, Y.S.; Wang, J.Q.; Tu, S.H.; Liao, K.T.; Lin, C.L. Detecting latent topics and trends in IoT and e-commerce using BERTopic modeling. Internet Things 2025, 32, 101604. [Google Scholar] [CrossRef]
  34. Kim, C.; Lee, J. Discovering patterns and trends in customer service technologies patents using large language model. Heliyon 2024, 10, e34701. [Google Scholar] [CrossRef]
  35. Hitl, M.; Greb, N.; Bagić Babac, M. Quantitative analysis of the relationship between expressing gratitude and forgiveness and user sentiment on social media. Glob. Knowl. Mem. Commun. 2025, 74, 42–62. [Google Scholar] [CrossRef]
  36. Ramamoorthy, T.; Kulothungan, V.; Mappillairaju, B. Topic modeling and social network analysis approach to explore diabetes discourse on twitter in India. Front. Artif. Intell. 2024, 7, 1329185. [Google Scholar] [CrossRef]
  37. Grootendorst, M. BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv 2022, arXiv:2203.05794. [Google Scholar]
  38. Jin, J.; Du, H.; Liu, Z. Measurement and evolution of government attention to the health industry in China based on the BERTopic model. PLoS ONE 2025, 20, e0329300. [Google Scholar] [CrossRef]
  39. Son, H.; Park, Y.E. Agenda-setting effects for covid-19 vaccination: Insights from 10 million textual data from social media and news articles using BERTopic. Int. J. Inf. Manag. 2025, 83, 102907. [Google Scholar] [CrossRef]
  40. Chen, W.; Rabhi, F.; Liao, W.; Al-Qudah, I. Leveraging state-of-the-art topic modeling for news impact analysis on financial markets: A comparative study. Electronics 2023, 12, 2605. [Google Scholar] [CrossRef]
  41. Egger, R.; Yu, J. A topic modeling comparison between lda, nmf, top2vec, and bertopic to demystify twitter posts. Front. Sociol. 2022, 7, 886498. [Google Scholar] [CrossRef]
  42. Hong, W.T.; Whyte, J.; Xue, J. A Natural Language Processing–Driven Framework for Policymaking in Infrastructure Development. J. Constr. Eng. Manag. 2025, 151, 04025025. [Google Scholar] [CrossRef]
  43. Yang, C.; Huang, C. Quantitative mapping of the evolution of AI policy distribution, targets and focuses over three decades in China. Technol. Forecast. Soc. Change 2022, 174, 121188. [Google Scholar] [CrossRef]
  44. Wang, Y.; Chen, H.; Gu, X. The characteristics of policy supply in the construction of smart emergency management in China: Based on text mining method. J. Clean. Prod. 2025, 487, 144451. [Google Scholar] [CrossRef]
  45. Jin, X.; Zhou, W.; Zhu, Q.; Wang, W.; Xu, G. Research on the Analysis and Application of Technological Supply and Demand Structure Based on LDA and BERTopic Models. Cogn. Robot. 2025, 5, 260–275. [Google Scholar] [CrossRef]
  46. Luo, L.; Yang, Z.; Yang, P.; Zhang, Y.; Wang, L.; Lin, H.; Wang, J. An attention-based BiLSTM-CRF approach to document-level chemical named entity recognition. Bioinformatics 2018, 34, 1381–1388. [Google Scholar] [CrossRef]
  47. Li, Q.; Chen, H.; Long, R.; Huang, Z.; Yang, S.; Sun, Q.; Sun, Y.; Ye, X. Research on data-driven group consensus decision-making of green methanol vehicle evaluation based on BERTopic text mining. Sustain. Energy Technol. Assess. 2025, 80, 104362. [Google Scholar] [CrossRef]
  48. Atzeni, D.; Bacciu, D.; Mazzei, D.; Prencipe, G. A systematic review of Wi-Fi and machine learning integration with topic modeling techniques. Sensors 2022, 22, 4925. [Google Scholar] [CrossRef]
  49. Lu, J.; Zhang, H.; Zhang, X. Cultural ecosystem services in China’s national parks and their impact on public online engagement—Analysis of Douyin short videos data based on BERTopic modeling. J. Nat. Conserv. 2025, 87, 126969. [Google Scholar] [CrossRef]
  50. Healy, J.; McInnes, L. Uniform manifold approximation and projection. Nat. Rev. Methods Primers 2024, 4, 82. [Google Scholar] [CrossRef]
  51. McInnes, L.; Healy, J.; Astels, S. hdbscan: Hierarchical density based clustering. J. Open Source Softw. 2017, 2, 205. [Google Scholar] [CrossRef]
  52. Liu, C.Z.; Sheng, Y.X.; Wei, Z.Q.; Yang, Y.Q. Research of text classification based on improved TF-IDF algorithm. In Proceedings of the 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE), Lanzhou, China, 24–27 August 2018; IEEE: New York, NY, USA, 2018; pp. 218–222. [Google Scholar]
  53. Farea, A.; Tripathi, S.; Glazko, G.; Emmert-Streib, F. Investigating the optimal number of topics by advanced text-mining techniques: Sustainable energy research. Eng. Appl. Artif. Intell. 2024, 136, 108877. [Google Scholar] [CrossRef]
  54. Amin, M.M.; Sani, N.S.; Nasrudin, M.F.; Abdullah, S.; Chhabra, A.; Abd Kadir, F. Clustering analysis for classifying fake real estate listings. PeerJ Comput. Sci. 2024, 10, e2019. [Google Scholar] [CrossRef] [PubMed]
  55. Yi, J.; Oh, Y.K.; Kim, J.M. Unveiling the drivers of satisfaction in mobile trading: Contextual mining of retail investor experience through BERTopic and generative AI. J. Retail. Consum. Serv. 2025, 82, 104066. [Google Scholar] [CrossRef]
Figure 1. Research framework.
Figure 1. Research framework.
Buildings 15 03324 g001
Figure 2. Declining trend of keyword importance weights across topics.
Figure 2. Declining trend of keyword importance weights across topics.
Buildings 15 03324 g002
Figure 3. Topic distance visualization and cosine similarity heat map.
Figure 3. Topic distance visualization and cosine similarity heat map.
Buildings 15 03324 g003
Figure 4. Hierarchical clustering of policy topics.
Figure 4. Hierarchical clustering of policy topics.
Buildings 15 03324 g004
Figure 5. Urban renewal policy theme intensity map from 2000 to 2025.
Figure 5. Urban renewal policy theme intensity map from 2000 to 2025.
Buildings 15 03324 g005
Figure 6. Hierarchical clustering diagram of the four major economic regions.
Figure 6. Hierarchical clustering diagram of the four major economic regions.
Buildings 15 03324 g006
Table 1. Characteristic words and representative sentences of urban renewal policy topics.
Table 1. Characteristic words and representative sentences of urban renewal policy topics.
TopicCharacteristic Terms
0Facilities_Upgrade_Residential Community
1Old Residential Communities_Jurisdiction_Urban Village
2Responsibility_Mechanism_Principle
3Funding_Loan_Subsidy
4Compensation_Expropriation_Housing
5Encourage_Social Capital_Explore
6Issue_Administrative Region_Repeal
7Planning_Plan_Review
8Formulate_Urban_Rural
9Compensation_Housing_Expropriation
10Legal Liability_Conduct_Lawful
11Land Grant_Land Use_Land Allocation
12Diligent_Management Committee_Trial Implementation
13Registration_Real Estate_Process
14Shantytown Renovation_Infrastructure_Registration
15Review_Plan_Submit
16Planning_Content_Renewal
17Urban-Rural_Commencement_Committee
18Structure_Function_Housing
19Preferential Policies_Pension Insurance_Reduction or Exemption
20Funding_Urban Village_Review
21Capacity_Civil Liability_Qualifications
22Housing Expropriation_Expropriation_Signing
23Public Notification_Competent Authority_Written
24Service Project_Contract_Shantytown Renovation
25Approval_Audit_Jurisdiction
26Completed_More_Old Residential Communities
27Assessment_Expropriation_Exchange
28Market Operation_Principle_Adhere
29Objective_Development_People
30Construction_Design_Construction Unit
31Land Use_Function_Facilities
32Tax Exemption_Deed Tax_Stamp Duty
33People’s Court_File_Compulsory Enforcement
Table 2. Sentence counts and proportions of topic directions.
Table 2. Sentence counts and proportions of topic directions.
Topic DirectionTopicsSentence CountProportion (%)
Spatial Improvement and Facility Upgrades0, 2, 14, 18, 26, 28, 29, 3110,42635.48
Project Collaboration and Approval1, 6, 7, 8, 12, 15, 16, 17, 20, 23, 25884030.08
Land Acquisition and Compensation4, 9, 11, 13, 22, 27, 33457415.56
Fiscal Incentives and Funding Support3, 5, 19, 32397213.51
Institutional Guarantees and Long-term Governance10, 21, 24, 3015785.37
Table 3. Annual distribution of topic strength in urban renewal policies.
Table 3. Annual distribution of topic strength in urban renewal policies.
TimeTopic 0Topic 1Topic 2Topic 32Topic 33
20000.02480.01150.02140.02080.0217
20010.02050.01200.01440.01500.0247
20020.05060.01450.02020.02480.0247
20030.08080.01470.02140.01520.0168
20040.04780.02690.03200.01650.0365
20050.04880.03140.02070.01980.0251
20060.02050.02410.01150.03300.0355
20070.05890.02470.02560.02760.0260
20080.02570.02400.01450.02720.0331
20090.05920.02950.03900.01600.0224
20100.06310.02010.03280.03050.0261
20110.04180.02550.02110.02790.0276
20120.03550.02240.02040.02230.0221
20130.05340.02800.02710.02550.0253
20140.05340.03600.03350.03260.0221
20150.04230.03420.02410.02490.0258
20160.04530.02480.03200.02400.0199
20170.09710.02600.03140.01390.0229
20180.05330.03910.03940.01660.0225
20190.06580.02720.02620.01620.0252
20200.12470.03980.05190.01650.0214
20210.12070.05390.05100.01640.0184
20220.12470.04660.04590.01410.0167
20230.12000.03230.03510.01450.0178
20240.08000.02630.03010.01540.0257
20250.09270.03020.02650.01350.0230
Table 4. Categories of China’s four major economic zones.
Table 4. Categories of China’s four major economic zones.
RegionProvince
EasternBeijing, Tianjin, Hebei, Shanghai, Jiangsu, Zhejiang, Fujian, Shandong, Guangdong, Hainan
WesternInner Mongolia, Guangxi, Chongqing, Sichuan, Guizhou, Yunnan, Tibet, Shaanxi, Gansu, Qinghai, Ningxia, Xinjiang
CentralShanxi, Henan, Hubei, Hunan, Jiangxi, Anhui
NortheasternLiaoning, Jilin, Heilongjiang
Notes: Hong Kong Special Administrative Region, Macau Special Administrative Region, and Taiwan Province are not included in the research sample.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, G.; Liu, X.; Luo, Q. Topic Mining and Evolutionary Analysis of Urban Renewal Policy Texts in China. Buildings 2025, 15, 3324. https://doi.org/10.3390/buildings15183324

AMA Style

Zhang G, Liu X, Luo Q. Topic Mining and Evolutionary Analysis of Urban Renewal Policy Texts in China. Buildings. 2025; 15(18):3324. https://doi.org/10.3390/buildings15183324

Chicago/Turabian Style

Zhang, Guozong, Xijing Liu, and Qianmai Luo. 2025. "Topic Mining and Evolutionary Analysis of Urban Renewal Policy Texts in China" Buildings 15, no. 18: 3324. https://doi.org/10.3390/buildings15183324

APA Style

Zhang, G., Liu, X., & Luo, Q. (2025). Topic Mining and Evolutionary Analysis of Urban Renewal Policy Texts in China. Buildings, 15(18), 3324. https://doi.org/10.3390/buildings15183324

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop