Next Article in Journal
The BN-350 Reactor Decommissioning: Quantitative Analysis and Prospects for Solid Radioactive Waste Management
Previous Article in Journal
Correction: Bashishtha et al. Reaction Curve-Assisted Rule-Based PID Control Design for Islanded Microgrid. Energies 2024, 17, 1110
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification

1
School of Energy Power and Mechanical Engineering, North China Electric Power University, Beijing 102206, China
2
State Grid Xinjiang Electric Power Company Economic and Technological Research Institute, Ürümqi 830063, China
3
State Grid Xinjiang Electric Power Corporation, Ürümqi 830000, China
*
Author to whom correspondence should be addressed.
Energies 2025, 18(17), 4650; https://doi.org/10.3390/en18174650
Submission received: 30 July 2025 / Revised: 28 August 2025 / Accepted: 29 August 2025 / Published: 2 September 2025

Abstract

In response to China’s dual-carbon strategy, this study proposes a comprehensive analytical framework to identify the evolutionary pathways of key policy tasks in developing a new-type power system. A dual-channel data acquisition process was designed to extract, standardize, and segment policy documents and online texts into a unified corpus. A multi-label BERT classification model was then developed, incorporating domain-specific terminology injection, label-wise attention, dynamic threshold scanning, and imbalance-aware weighting. The model was trained and validated on 200 energy news articles, 100 official policy releases, and 10 strategic planning documents. By the 10th epoch, it achieved convergence with a Macro-F1 of 0.831, Micro-F1 of 0.849, and Samples-F1 of 0.855. Ablation studies confirmed the significant performance gain over simplified configurations. Structural label analysis showed “Build system-friendly new energy power stations” was the most frequent label (107 in plans, 80 in news, 24 in policies) and had the highest co-occurrence (81 times) with “Optimize and strengthen the main grid framework.” The label co-occurrence network revealed multi-layered couplings across generation, transmission, and storage. The Priority Evaluation Index (PEI) further identified “Build shared energy storage power stations” as a structurally central task (centrality = 0.71) despite its lower frequency, highlighting its latent strategic importance. Within the domain of national-level public policy and planning documents, the proposed framework shows reliable and reusable performance. Generalization to sub-national and project-level corpora is left for future work, where we will extend the corpus and reassess robustness without altering the core methodology.

1. Introduction

In response to China’s dual-carbon strategy, the construction of a new-type power system dominated by renewable energy has become a national priority. This system aims to ensure a low-carbon, safe, and efficient electricity supply while addressing the growing variability and uncertainty from high renewable penetration [1]. A series of top-level policy frameworks, including the “1 + N” system and the “Ten Carbon-Peak Actions,” reflect the state’s intensive policy deployment across energy structure, grid modernization, and energy storage development [2,3]. As system transformation accelerates, policy has evolved from general guidance to a core driver of structural adjustment and technical advancement in the power sector [4,5]. Keywords such as “source-grid-load-storage coordination” and “virtual power plants” increasingly appear in official documents, reflecting the complexity and interdisciplinary nature of policy tasks. However, policy texts in this domain are often semantically dense, label-rich, and structurally heterogeneous [6]. They span multiple domains—technology, market, regulation—and vary widely in format and terminology. Traditional rule-based or manual analysis methods are no longer adequate. To address this, there is a growing demand for an intelligent, semantics-aware analytical framework capable of multi-label recognition, structural abstraction, and dynamic reasoning [7,8].
Policy texts in the new-type power system are inherently complex, often involving multiple interrelated topics such as energy technologies, grid planning, market reforms, and regulatory mechanisms. This multi-label nature poses challenges for conventional classification methods, which typically ignore label dependencies and struggle with long-tailed label distributions [7,8,9,10]. Moreover, semantic ambiguity and inconsistent terminology are common across documents issued by different agencies. The same term may carry varying meanings in different contexts, complicating syntactic parsing and semantic alignment [6,7,11]. Finally, energy policies evolve dynamically over time, with shifting priorities and emerging themes. Static models fail to capture these temporal patterns, hindering the detection of policy evolution and task transitions [12,13]. To address these challenges, a more advanced framework is needed—one that combines deep semantic understanding, structural label modeling, and temporal reasoning to support comprehensive policy analysis in multi-label settings.
The Bidirectional Encoder Representations from Transformers (BERT) model has gained significant attention in the field of natural language processing (NLP) due to its bidirectional context modeling and superior performance in various downstream tasks. BERT has demonstrated remarkable advantages in multi-label text classification by capturing semantic dependencies and contextual nuance. Yarullin and Serdyukov [14] proposed a sequence-to-sequence model based on BERT to handle multi-label classification with improved accuracy. Veeranki et al. [15] successfully applied BERT to classify clinical texts with multiple overlapping labels, showcasing its robustness in handling semantic complexity. Similarly, Singh [16] introduced a practical guide for constructing multi-label and multi-class classifiers using BERT, reinforcing the model’s adaptability in complex classification settings. These studies confirm BERT’s strong capacity in managing overlapping categories, ambiguous meanings, and diverse syntactic structures, making it an appealing choice for high-dimensional multi-label classification problems.
Despite these advantages, current BERT-based models remain insufficiently adapted to the structural and semantic particularities of electricity policy texts. Such documents are often rich in technical terminology, exhibit high label density, and vary significantly across policy levels and document types. While some studies have attempted to apply BERT in policy-related contexts, such as Yang et al. [17]’s classification of new energy policy texts. Corringham et al. [18]’s analysis of Paris Agreement climate action plans, these models were not specifically optimized for the characteristics of electricity sector policy discourse. Moreover, most available BERT variants do not account for label imbalance, inter-label correlation, or temporal patterns that are frequently observed in energy regulatory documentation [19,20]. As a result, there is a critical need to design and train BERT-based models specifically tailored to the semantics, structure, and evolving logic of electricity policy texts, to support accurate task identification, regulatory mapping, and long-term trend analysis.
This study proposes an integrated analytical framework that unifies three essential dimensions—data-driven modeling, semantic structuring, and dynamic mining—to address the semantic complexity, structural ambiguity, and evolving nature of electricity policy texts. While leveraging established NLP techniques like BERT and attention mechanisms, our primary contribution lies in their novel integration and application to the domain of energy policy analysis. Specifically, we develop a complete pipeline—from specialized data acquisition and domain-adapted model training to structural network analysis and priority evaluation—tailored specifically for deciphering China’s new-type power system policy corpus. This systematic approach not only enhances the semantic expressiveness and task interpretability of policy text classification but also provides a structural lens to infer task integration paths and predict directional evolution trends, offering a new paradigm for data-driven policy analysis in the energy sector.

2. Literature Review and Algorithm Design

2.1. Review of Traditional Policy Analysis Methods and Their Constraints

Traditional energy policy analysis methods—such as cost-benefit analysis (CBA), integrated assessment models (IAMs), and scenario-based planning—have been widely adopted in the energy sector. While these approaches provide structured decision-making tools, they often face significant limitations in addressing the complexity, uncertainty, and multidimensionality of contemporary energy systems.
CBA, for instance, has long been criticized for undervaluing long-term environmental and intergenerational impacts due to its reliance on discount rates, which can bias policy outcomes toward short-term economic efficiency [21,22]. Moreover, CBA inadequately captures non-market values, such as ecological resilience and social equity, which are increasingly critical in energy transition contexts [23].
IAMs attempt to integrate economic, technological, and environmental dimensions into comprehensive models. However, these models often rely on simplified assumptions and deterministic frameworks that inadequately reflect real-world uncertainties, behavioral variability, and system feedback [24,25]. Studies have shown that IAMs can generate policy prescriptions that are either overly optimistic or misleading under uncertain innovation pathways [26].
Scenario-based policy methods offer flexibility in envisioning alternative futures but are limited by the subjectivity and scope of selected scenarios [27]. They tend to produce narratives disconnected from empirical dynamics and often overlook adaptive policy mechanisms, which are essential under fast-evolving energy conditions [28]. These traditional tools generally lack responsiveness to feedback-driven changes and are often static by design.
In addition, many conventional approaches inadequately account for the social and political dimensions of energy systems. Issues such as distributive justice, public participation, and institutional inertia are either overlooked or treated as peripheral, despite their demonstrated influence on policy acceptance and implementation outcomes [29]. This is particularly problematic in low-carbon transitions, where legitimacy and inclusion are essential [30].
Given the rising complexity of energy policy ecosystems and the accelerating pace of institutional and technological change, there is a growing demand for advanced analytical frameworks. These should incorporate semantic modeling, network structures, and dynamic adaptation to support multi-label task identification and policy evolution analysis beyond the capacity of traditional methods.

2.2. Theoretical Innovation and Positioning of This Study

In contrast to the limitations of traditional policy analysis methods discussed above, this study proposes an integrated analytical framework that offers a novel theoretical perspective for decoding complex policy texts. Rather than merely applying NLP techniques, our framework innovates through a tripartite theoretical fusion: (1) deep semantic understanding, achieved through a domain-optimized BERT model that captures the nuanced context of electricity policy terminology; (2) structural relationship mining, which employs label co-occurrence networks to map the interdependencies and synergistic pathways among multi-dimensional policy tasks, moving beyond isolated analysis; and (3) dynamic priority evaluation, introduced via the Priority Evaluation Index (PEI) that integrates frequency and network centrality to identify both overt and latent strategic foci. This “data-driven modeling—structural abstraction—decision support” pipeline establishes a new paradigm for policy text analysis. It provides a quantifiable, interpretable, and structurally-aware approach to understanding the composition and evolution of policy tasks, thereby addressing the critical gaps of static, reductionist, and semantically-agnostic analysis inherent in conventional methods like CBA and IAMs.

2.3. Algorithm Design

2.3.1. Algorithm Framework and Design Principles

To address the limitations of traditional policy analysis methods in large-scale processing, semantic modeling, and task abstraction, this study proposes a structured mining approach for policy texts related to the development of new-type power systems. The overall process follows a technical pipeline of “data extraction—text sanitization and standardization—semantic modeling—structure generation—task inference,” forming an integrated research framework that encompasses “semantics, structure, and decision-making,” as illustrated in Figure 1.
First, a dual-channel Policy Text Retrieval Layer is constructed, comprising a web crawler for collecting policy news from authoritative platforms (e.g., NDRC, NEA) and a PDF extractor for electricity-related documents. The retrieved texts are then cleaned and standardized by removing noise (e.g., table of contents, footers, non-Chinese characters) and applying domain-specific stopword filtering, yielding a structured corpus for modeling.
Second, a Domain-Optimized BERT Training Framework is developed for multi-label classification, incorporating label-wise attention, terminology injection, dynamic thresholding, and label weighting to enhance recognition of low-frequency and overlapping policy tasks. An ablation study confirms the effectiveness of each module.
Third, the Label Prediction Evaluation Framework analyzes model outputs through label popularity and co-occurrence networks, capturing semantic coupling among tasks. A Priority Evaluation Index (PEI), combining label frequency and centrality, identifies key task labels and infers policy evolution trajectories.
Overall, the methodology integrates multi-source data fusion, semantic modeling, and structural inference, offering robust generalization and strong applicability to policy analysis in new-type power systems.

2.3.2. Specifications of the Policy Document Harvesting Algorithm

To collect policy news and public notices from government websites (e.g., China News Service, NEA), an automated data harvesting model was developed using the requests and BeautifulSoup4 libraries. The process consists of three layers: (1) Request Layer with timeout and retry mechanisms to ensure stability; (2) Parsing Layer that extracts core content via <p> tags and filters out irrelevant information; and (3) Post-Processing Layer that handles encoding conversion and output standardization.
This structured approach ensures both high efficiency and accuracy, enabling seamless integration with the PDF extraction pipeline. The web crawler expands coverage to time-sensitive and narrowly scoped texts, improving the comprehensiveness and representativeness of the policy corpus. Specific parameters are provided in Table 1.

2.3.3. Specifications of the PDF Extraction Algorithm for Technical Documents

Ten core policy documents—including the 14th Five-Year Special Plans and the Electricity Development Blue Book issued by the NDRC—were selected as the corpus. To address their multi-level structure, semantic density, and formatting diversity, a PyPDF2-based parsing framework was developed with three stages: (1) Table of Contents Recognition using regular expressions and a title feature dictionary to exclude unstructured content; (2) Main Text Localization through a dual-threshold confidence model combining structural markers (e.g., “Chapter 1”) with layout cues; and (3) Adaptive Text Slicing, employing a semantic sliding window (512-character length, 15% overlap) aligned with BERT input constraints to preserve content integrity and context. Detailed parameters are shown in Table 2.

2.3.4. Specifications of the BERT-Driven Multi-Label Classification Framework

A multi-label classification framework based on BERT-base-Chinese was constructed for label prediction. Its core components include: (1) Label Semantic Embedding, modeling interactions between label vectors and Transformer hidden states to capture semantic associations; (2) Label-Wise Attention, establishing cross-modal links between text and labels to enhance fine-grained recognition; and (3) Dual Optimization, combining category weight compensation (pos_weight) during training and dynamic thresholding with Top-k fallback during prediction to address class imbalance and improve long-tail label performance.
By jointly optimizing label semantics and data distribution, the framework achieves strong generalization under complex multi-label conditions. Parameter settings are detailed in Table 3.

3. Methodology

3.1. Data Acquisition and Preprocessing Techniques

To build a policy corpus for new-type power systems and enable structured label modeling, two automated Python 3.11.7-based extraction pipelines were developed for structured PDF documents and unstructured web texts. It is important to note that the scale of the curated corpus, while sufficient for initial model validation and the structural analysis presented in this study, is limited by the availability of high-quality, authoritative policy documents at the national level. The focus on these top-tier sources ensures analytical rigor and a clear view of the central policy discourse, but it necessarily excludes a vast number of regional and project-level texts. This scope definition allows us to establish a robust foundational framework, which can be expanded upon in future work with broader data collection. As shown in Figure 2, the workflow includes the following:
(1)
Text Acquisition: News articles are collected via web scraping (requests + BeautifulSoup), and PDF documents are parsed using PyPDF2.
(2)
Content Extraction: Paragraphs are extracted from tags in HTML; tables of contents are skipped in PDFs to isolate the main content.
(3)
Cleaning and Standardization: Regular expressions are used to merge whitespace and remove invalid characters, headers, footers, and noise; the first chapter title marks the start of the main body.
(4)
Segmentation and Formatting: Text is split into 512-character samples with empty “label” fields, and stored in structured Excel files for downstream BERT training.

3.1.1. Web-Based Text Data Collection and Cleaning

To handle text announcements and news content published on official websites such as the NEA and the NDRC, a stable web scraping module based on requests and BeautifulSoup4 was developed. This module incorporates capabilities such as encoding recognition, content parsing, paragraph merging, and format unification. The overall logic of the module is summarized in Algorithm 1:
Algorithm 1. Web-based text data collection and cleaning
Function FetchTextFromWebpage(url):
  Try:
    response = DownloadWebpage(url)
    paragraph_list = Extract <p> Tags from HTML
    TEXT = Join all paragraphs as a single string
    Clean TEXT by:
      - Character filtering
      - Whitespace normalization
    Return cleaned TEXT
  Except:
    Return empty string

3.1.2. Structured Data Extraction and Cleaning for PDF-Based Policy Documents

To process typical PDF-format policy documents, such as the 14th Five-Year related plans, Blue Books, and regional electricity development reports, this study employs a page-by-page reading combined with regular expression-based cleaning to extract the core textual content. The overall logic of the module is summarized in Algorithm 2.
Algorithm 2. Structured data extraction and cleaning for PDF-based policy documents
Function ExtractTextFromPDF(pdf_path):
  Initialize empty string TEXT
  For each page in PDF document:
    Extract page_text = GetTextFromPage(page)
    Append page_text to TEXT
  Clean TEXT by:
    - Removing special symbols
    - Replacing multiple whitespace with single space
  Return cleaned TEXT

3.2. BERT Model Training and Ablation Study

3.2.1. Training of the BERT Model

To enable accurate recognition of multiple policy task directions, an enhanced multi-label classification model based on bert-base-chinese was developed. As shown in Figure 3. The following subsection describes the architecture of our multi-label classification model. The core objective is to enable the model to not only understand the general meaning of policy text but also to precisely identify and assign multiple relevant policy task labels to it, mimicking how a domain expert would read and categorize the documents.
(1) Input Configuration: Texts are tokenized with a 512-character limit. Domain-specific terms (e.g., “source-grid-load-storage integration”, “virtual power plant”) are added to the tokenizer. The selection process was designed to be systematic and objective, based solely on the authoritative policy texts under study:
Candidate Term Extraction: An initial list of candidate terms was compiled by first segmenting the text of core policy documents (e.g., the Action Plan for Accelerating the Construction of a New-Type Power System and the *14th Five-Year Plan for Modern Energy System*). N-grams (phrases consisting of 2 to 4 words) were then extracted and ranked by their Term Frequency (TF) and TF-IDF (Term Frequency-Inverse Document Frequency) scores within this corpus. This quantitative approach ensured that high-frequency, content-specific phrases were prioritized.
Manual Filtering and Finalization: The top-ranked n-grams were then manually reviewed to retain only those that represented semantically complete and technically critical concepts within the domain (e.g., (source-grid-load-storage integration), (virtual power plant), (electricity-carbon coupling)). Redundant, overly generic, or incomplete phrases were removed. This process resulted in a finalized lexicon of 15 domain-specific terms, which were subsequently added to the tokenizer.
(2) Label Attention: Each label is assigned an embedding vector, serving as the Query in a multi-head attention mechanism with BERT hidden states as Key/Value. Outputs are passed through linear layers and sigmoid activations to generate multi-label predictions.
(3) Class Imbalance Handling: A weighted BCE loss (via pos_weight) is used to improve the learning of low-frequency labels.
(4) Threshold and Top-k Strategy: Thresholds between 0.1–0.6 are scanned on the validation set to optimize macro-F1. This top-k fallback mechanism is a common post-processing technique in multi-label classification. It is primarily employed to prevent “empty predictions” (i.e., instances where no label exceeds the dynamic threshold), which would render those samples unclassifiable and unusable in downstream analysis. While it is acknowledged that this strategy may occasionally force an incorrect label for particularly challenging samples, it guarantees that every text segment receives at least one prediction, maintaining the integrity and completeness of the dataset for subsequent structural analysis. Our empirical validation confirmed that the benefits of achieving full sample coverage outweigh the potential introduction of minor noise. The overall logic of the module is summarized in Algorithm 3:
Algorithm 3. Domain-optimized BERT for multi-label classification
Input:
  - Labeled training set (texts + multi-labels)
  - Unlabeled test set (texts)
  - Pretrained BERT model (e.g., “bert-base-chinese”)
Output:
  - Trained multi-label classifier
  - Top-N predicted labels for each test text
  - Dynamic threshold file
  - Final label frequency visualization
------------------------------------------------------------
Training Stage
1. Load labeled dataset from Excel
  - Clean empty labels
  - Convert multi-label strings to binary indicator matrix
2. Initialize tokenizer and inject custom tokens
  - Tokens include domain-specific phrases (e.g., “virtual power plant”)
3. Tokenize text with max_length = 512
  - Return PyTorch[2.3.1]-compatible tensors
  - Construct dataset for training/validation
4. Initialize LabelAttentionClassifier:
  - BERT encoder → token embeddings
  - Learnable label embeddings
  - Multi-head attention: label embedding as query
  - Output: sigmoid score for each label
5. Compute positive class weights (inverse label frequency)
  - Assign to BCEWithLogitsLoss (weighted binary cross entropy)
6. Train model using HuggingFace Trainer
  - Monitor macro-F1 on validation set
  - Use standard Adam optimizer and learning rate schedule
7. Save:
  - Trained model weights
  - Tokenizer config
  - Label encoder (MultiLabelBinarizer class list)
  - Dynamic threshold file (JSON)
------------------------------------------------------------
Threshold Scanning (Validation)
8. For each label:
  a. Predict probability scores on validation set
  b. For thresholds t ∈ [0.1, 0.9] step 0.02:
    - Compute binary predictions using threshold t
    - Evaluate F1 score
  c. Select t that yields highest F1 → save as label-specific optimal threshold
9. Save all best thresholds into JSON (dynamic_thresholds.json)
------------------------------------------------------------
Inference and Prediction
10. Load new test set from Excel
11. Tokenize using same tokenizer
12. Perform forward pass to get logits
13. Apply sigmoid to get probabilities
14. For each sample:
  a. Adjust label probabilities if label is in penalty list
  b. Compare with dynamic thresholds:
    - If prob > threshold: label is selected
  c. Enforce label count control:
    - If selected > 3 → only retain top-3 probs
    - If selected == 0 → fallback to top-1 prediction
15. Store:
  - Top-N predicted labels
  - Main label (highest probability)
  - Full probability vector
16. Save results to Excel and plot label frequency histogram

3.2.2. Ablation Study Configuration

To assess the contribution of each module in the proposed model, a series of ablation studies was conducted by systematically removing or replacing components. The experiments aimed to evaluate the following: (1) the effect of label embeddings and attention on multi-label representation, (2) the role of imbalance-aware weighting in predicting low-frequency labels, (3) the impact of dynamic thresholding and Top-k fallback on sample-level coverage, and (4) potential redundancy or synergy among modules.
Six experimental configurations were designed by altering the complete model (Configuration 1), as shown in Table 4. Each setup was tested in five repeated runs using the same data and hyperparameters. Performance was evaluated using mean Macro-F1, Micro-F1, and Samples-F1 scores to quantify the marginal gains of each module.
Each experimental configuration was subjected to five repeated trials using identical training data and hyperparameter settings. The mean values of the Macro-F1, Micro-F1, and Samples-F1 scores were used for performance comparison, in order to validate the actual improvements in model performance and label coverage contributed by the respective modules.

3.2.3. Baseline Methods

We adopt two representative baselines to cover the ends of the method spectrum while keeping the comparison focused:
(B1) Linear SVM (TF-IDF). Documents are represented by word-level TF-IDF with sublinear TF, L2 normalization, and a 50k max-features cap. A linear SVM (hinge loss) is trained in a one-vs-rest fashion with class-balanced weights; the penalty C∈{0.5, 1, 2} is selected on the validation set.
(B2) RoBERTa-wwm-ext (fine-tuned). We fine-tune the Chinese RoBERTa-wwm-ext encoder with a sigmoid multi-label head (binary cross-entropy). Max sequence length 512, batch size 16, learning rate 2 × 10−5, 3 epochs with early stopping on validation Macro-F1.
All models share the same train/validation/test split, identical preprocessing, and label binarization. Per-label thresholds are selected on the validation set to maximize F1. Metrics are Macro-F1, Micro-F1, Samples-F1. Implementations follow scikit-learn for SVM and HuggingFace transformers for RoBERTa.

3.3. Evaluation and Analysis of Predicted Labels

Using the PDF policy document extraction model and the web scraping model, this study completed the cleaning and extraction of textual content from 200 online news articles, 100 official policy releases, and 10 planning documents. Subsequently, the extracted texts were subjected to unified label prediction using the improved BERT model, which had been trained based on the Action Plan for Accelerating the Construction of a New-Type Power System. The predicted labels were then analyzed from three perspectives: structural popularity analysis, co-occurrence relationship analysis, and priority ranking analysis. The detailed analysis results are presented below.

3.3.1. Analysis of the Popularity Structure of Labels

To compare the differences in label structure attention across different text sources, a statistical method was proposed in this study based on the predicted label probabilities. This method is inspired by the “source-grouped label heat aggregation” framework commonly used in multi-source policy research validation. Assume that the text dataset D is divided into K subsets, each corresponding to a different text source (e.g., policies, news articles, planning documents). For each subset, denoted as D ( k ) = x 1 ( k ) , x 2 ( k ) , , x n k ( k ) , where nk source. For a given label tj∈T (where T denotes the complete label set with cardinality, |T| = m), the predicted probability for a sample on x i ( k ) label t is denoted as follows [31]:
P j ( k ) = 1 n k i = 1 n k p i j ( k )
where, p i j ( k ) ∈ [0, 1] represents the model’s predicted probability that sample x i ( k ) belongs to label tj.
By averaging over all samples j = 1,…, m. within each source k = 1,…, K, the two-dimensional source-label heat matrix P is constructed as follows [31]:
P = [ P j ( k ) ] m × K
This heat matrix enables comparison of the explicit expression strength of different labels across various text sources, and assists in identifying dominant labels that are particularly emphasized within specific sources, The focused label subset is then selected as follows [32]:
T t o p N ( k ) = T o p N ( { P j ( k ) } j = 1 m )
where, TopN(⋅) denotes selecting the top N labels with the highest heat values within source k, which are used to construct the core label group for each source are used to construct the core label group for each source.

3.3.2. Analysis of Label Co-Occurrence Relationships

In multi-label classification tasks, labels often exhibit structured co-occurrence relationships, particularly in energy policy texts where such interconnections are critical. To reveal the collaborative structures among predicted policy labels, a Label Co-occurrence Network was constructed based on the model’s predicted label outputs. This method is widely applied in policy text network analysis and complex system evolution studies. This co-occurrence matrix can be intuitively understood as a network where policy tasks are connected based on their frequency of being mentioned together. For instance, if the labels “Build shared energy storage power stations” and “Enhance grid flexibility” are frequently predicted together in the same text segments, a thick link would form between them in the network, indicating a strong policy-level association between these two tasks. Assume that the complete sample set is denoted as [33] D = { x 1 , x 2 , , x n } , where the model predicts for each sample xi a binary label vector y ^ i { 0 , 1 } m , with m representing the total number of labels. A value of 1 indicates that the label is predicted as present.
The co-occurrence count Cjk between labels (tj, tk) is defined as follows [34]:
C j k = i = 1 ` n y ^ i j y ^ i k ,   j , k { 1 , 2 , , m } ,   j k
where y ^ i j denotes the prediction result (0 or 1) of the j-th label for the i-th sample and C j k represents the number of times labels tj and tk are predicted together in the same sample.
All pairwise co-occurrence counts are organized into a symmetric label co-occurrence matrix [35]:
C = [ C j k ] m × m
The label co-occurrence network constructed from C serves as an intermediate structure to model the task interdependencies and system evolution patterns. It can be used to identify core task directions within policy documents, providing a logical foundation for subsequent phased task structuring and priority sequence evaluation.

3.3.3. Evaluation of Label Priorities

To further identify the core task directions reflected in policy documents and their system evolution characteristics, this study proposes a composite evaluation method termed the PEI [36]. This method integrates label heat (frequency) and structural centrality features. The core idea is that the priority of a label should not be determined solely by its explicit frequency (heat) within the text corpus, but should also reflect its structural importance (connectivity and centrality) within the label co-occurrence network. This approach enables a more efficient task structuring process and supports downstream applications in strategic analysis and task decomposition [37].
The PEI for each label tj is defined as follows [38]:
P E I j = α P j + β S j
where Pj denotes the predicted frequency (or heat) of label tj (e.g., its occurrence rate across different text sources), Sj represents the normalized structural centrality score of label tj in the label co-occurrence network, α , β [ 0 , 1 ] are weighting coefficients and are adjustable according to application needs. In this study, α = 0.6 and β = 0.4 are used, emphasizing frequency while maintaining a strong sensitivity to structural importance. The weighting coefficients α and β regulate the relative importance of the frequency term (Pj) and the structural centrality term (Sj). In this study, we set α = 0.6 and β = 0.4. This configuration was determined through preliminary grid search experiments on the validation set, aiming to balance the influence of a label’s explicit prominence in the text against its implicit structural role within the task network. A higher weight for frequency (α) reflects that a task explicitly mentioned across numerous documents is likely of immediate importance. However, retaining a significant weight for centrality (β) ensures that tasks acting as critical connective hubs between multiple domains are also prioritized, capturing their latent strategic value. This weighting scheme aligns with common practices in multi-criteria decision-making (MCDM) where primary factors are often assigned greater but not absolute weight [39].
To obtain the structural centrality score Sj, the degree of each label tj node in the co-occurrence matrix C is calculated and normalized [40]:
S j = d e g ( t j ) m 1
d e g ( t j ) = k = 1 m ( C j k > τ )
where Sj and deg(tj) represents the degree (total number of co-occurrences) of label tj with other labels, normalized to the [0, 1] range. The co-occurrence threshold τ used for degree calculation is set to 0.25 in this study, typically selected from the range τ = 0.2∼0.3 [41].
After calculating the PEI for all labels, the priority ranking list Tpriority is obtained by sorting the labels in descending order of their PEI values [42]:
T p r i o r i t y = S o r t D e s c ( { P E I j } j = 1 m )

4. Results and Discussion

4.1. Performance Analysis of BERT Model Training

To evaluate the performance of the constructed BERT-based multi-label classification model in the task of automatic label identification for policy texts, the training corpus was divided into a training set and a validation set. Supervised training was conducted over 10 epochs on these two datasets, during which the Macro-F1, Micro-F1, and Samples-F1 metrics, as well as the validation loss, were recorded after each epoch. The corresponding results are summarized in Table 5.
From an overall perspective, the model exhibited weak performance during the initial training phase (Epochs 1–3), with a Macro-F1 of only 0.08 and a Samples-F1 of 0.09, indicating that the model had not yet effectively captured the semantic co-occurrence structures among labels. This phase corresponds to the “cold start” period of training. As training progressed, a significant improvement was observed in all F1 metrics starting from Epoch 4, with the Macro-F1 rising to 0.53 and the Samples-F1 reaching 0.56 by Epoch 6, signaling the onset of a rapid convergence phase.
Figure 4 visualizes the dynamic evolution of performance metrics across training epochs from two perspectives: a three-dimensional area plot and a radar chart. Examination of the area stacking plot reveals that Epochs 7–10 correspond to a performance stabilization period, during which the Macro-F1 consistently remained above 0.80, and the Micro-F1 and Samples-F1 steadily converged, reaching 0.84 and 0.85, respectively, at Epoch 10. Meanwhile, the validation loss continuously declined to a minimum value of 0.25 without oscillations or rebounds, confirming that the model did not exhibit overfitting during extended training and maintained strong generalization capability.
Notably, in the radar chart, the coverage area for each F1 metric expanded progressively, whereas the area corresponding to the validation loss shrank markedly, further illustrating the model’s convergence trajectory and stability from a multi-dimensional perspective. In particular, the continuous improvement of Samples-F1 highlights the model’s robustness in capturing sample-level label structures, such as variability in the number of labels per sample and semantic clustering of labels, demonstrating strong structural awareness even under the high task complexity of policy text modeling.
The training results validate the feasibility and structural adaptability of the proposed BERT model for multi-label recognition tasks in policy documents, providing a solid foundation for subsequent label structure modeling and priority inference.

4.2. Performance Analysis of Ablation Study

To systematically evaluate the marginal contributions of each structural component in the BERT model to the final performance, seven groups of ablation experiments (Configurations from 1 to 7) were designed. Core modules such as the label-wise attention mechanism, terminology injection, dynamic thresholding, label weighting adjustment, and Top-k compensation strategy were sequentially removed or replaced. Under identical training conditions, the Macro-F1, Micro-F1, and Samples-F1 metrics were calculated to analyze the impact of each module on model performance. The functional composition and corresponding scores for each experimental configuration are presented in Table 6, while Figure 5 provides an intuitive visualization of the performance trends through a heatmap of F1 scores.
As shown in the table and heatmap, the full-scale model (Configuration 1) consistently achieved the best performance across all metrics, with a Macro-F1 of 0.831, Micro-F1 of 0.849, and Samples-F1 of 0.855, demonstrating the robustness and generalization capability of the complete structural configuration in multi-label classification tasks.
When the terminology injection mechanism was disabled (Configuration 2), a significant performance drop was observed, with Macro-F1 decreasing to 0.723. This highlights the critical role of introducing domain-specific terms as external priors, significantly enhancing the semantic activation of domain-relevant labels, and confirming their importance in domain-specific modeling.
Replacing the dynamic thresholding strategy with a fixed threshold (Configuration 3) resulted in a decline in Macro-F1 to 0.802, indicating that dynamic thresholding significantly improves the adaptability of classification boundaries in multi-label prediction.
Furthermore, in Configuration 4, where label weighting adjustment was disabled (i.e., using an unweighted BCE loss without label frequency correction), the Macro-F1 dropped to 0.768, demonstrating that this strategy effectively mitigates training bias arising from label imbalance and improves the model’s treatment of long-tail labels.
Configuration 5, where the Top-k prediction strategy was entirely removed, showed a uniform decline of all three F1 metrics to approximately 0.74. This result confirms the critical role of the Top-k mechanism in preventing empty predictions and maintaining label completeness at the sample level. In Configuration 6, where terminology injection, dynamic thresholding, and label weighting were jointly removed to construct a minimized structural model, the Macro-F1 further dropped to 0.698, the lowest among all configurations. This result highlights the cumulative performance gains achieved through the synergy of the individual modules.
Finally, in Configuration 7, the label-wise attention module was removed, forcing the model to rely solely on the [CLS] token for multi-label classification. The Macro-F1 declined to 0.755, illustrating that the attention mechanism provides significant fine-grained modeling advantages in capturing the semantic correspondence between labels and textual content. The heatmap shown in Figure 5 visually depicts the differential impacts of module removal on Macro-F1, Micro-F1, and Samples-F1. A color gradient from yellow to purple indicates the performance improvement trend. The full-scale model appears as the deepest blue region, representing the optimal configuration, while the minimized model (Configuration 6) appears as a light yellow block, showing the worst performance and forming a strong contrast.
The proposed improved BERT model, based on the collaborative optimization of multiple functional modules, achieved significantly superior performance compared to the degraded models. The effectiveness of each module has been individually validated through systematic experiments, providing a reliable semantic label foundation for subsequent structural insights and evolution trend prediction.
The significant performance drop observed in the ablation study when domain terminology was removed (Configuration 2, Macro-F1: 0.723 vs. 0.831) underscores a pivotal finding: accurate semantic understanding of electricity policy texts is highly dependent on domain-specific knowledge. This observed “dependence” is not a limitation but a deliberate design feature of our framework. The model is purpose-built for the specialized domain of energy policy analysis, not for general-purpose text classification. The injection of curated terminology is therefore a core component that enables the model to correctly interpret technically nuanced phrases, which is essential for achieving high precision in this domain. Consequently, while the model’s applicability is specialized, its performance within its intended scope is robust and highly effective.

4.3. Comparison with Baselines

Table A1 summarizes test-set performance. The Full-Scale Model achieves 0.831/0.849/0.855 (Macro-/Micro-/Samples-F1), surpassing RoBERTa-wwm-ext (0.804/0.812/0.817) by +0.027, +0.037, and +0.038, respectively, and clearly exceeding the classical Linear SVM (0.748/0.741/0.752). The margin over RoBERTa indicates that components unique to our system—attention, terminology handling, and adaptive thresholds—yield cumulative gains, consistent with the ablation study.

4.4. Performance Analysis of Label Prediction

4.4.1. Results of Structural Analysis of Label Popularity

To further reveal the differences in label attention and structural distribution characteristics across different types of policy texts, this study constructed a label heat matrix P based on the predicted label set, where mmm denotes the number of label categories and s = 3 represents the text source dimensions (specifically, Strategic Energy Planning Documents, Digital Energy News Platforms, and Official Policy Directives on Government Portals). Each element pij of the matrix indicates the frequency of label li appearing in the j-th category of text.
Table 7 lists the top 20 labels with the highest overall frequencies, while Figure 6a and Figure 6b respectively present the normalized proportional distribution and absolute frequency distribution across the three sources. Together, they clearly illustrate the variation in label focus and task framing strategies among the different document types.
From the results, the label Build system-friendly new energy power stations ranks consistently high across all three sources, with 107 mentions in strategic documents, 80 in news platforms, and 24 in government directives. This indicates its role as a core construction pathway for the new-type power system, with strong emphasis at the top-level planning stage, but limited expression in officially released policies. In contrast, labels such as Optimize and strengthen the main grid framework and Increase the proportion of renewable energy transmission show higher frequencies in news media (80 and 65 mentions, respectively), suggesting heightened media attention to grid modernization and renewable energy integration progress.
Several technology-focused labels also show dominance in strategic planning texts. For instance, Explore application of new energy storage technologies and Apply advanced technologies in transmission channels appear 61 and 28 times, respectively, in planning documents, far more than in government directives, indicating their emphasis as long-term strategic initiatives rather than immediate policy actions.
Notably, certain labels display a “media-strong, policy-weak” distribution pattern. For example, Design intelligent scheduling systems and Promote application of grid-forming technology receive modest attention in news articles (8 and 29 mentions), but appear only 4 and 5 times in government directives, reflecting their status as topics of technical discussion rather than standardized policy priorities.
Labels such as Build shared energy storage power stations and Implement coordination between computing power and electricity show relatively balanced attention between planning and news sources (7/34 and 47/14, respectively), but remain nearly absent from official policies (30 and 2), implying that such topics are still in the exploratory stage without established normative frameworks.
In Figure 6a, the horizontal stacked bar chart further reveals structural composition differences across sources. Many labels show over 50% of their proportional mentions in news media—such as Next-generation coal demonstration, Smart scheduling systems, and Main grid optimization—highlighting the media’s focus on engineering implementation and deployment narratives. Conversely, Figure 6b reinforces the dominant role of planning texts in defining core strategic tasks, as 7 of the top 10 high-frequency labels lead in the Strategic Energy Planning category.
In summary, the analysis reveals a distinct three-stage divergence in how policy tasks are expressed across sources: from top-level planning → public discourse → formal policy issuance. These structural discrepancies in label distribution and semantic framing lay the foundation for follow-up studies on policy alignment evaluation and label priority scoring.

4.4.2. Results of the Analysis of Label Co-Occurrence Relationships

To investigate the structural coupling relationships and synergistic evolution pathways among policy tasks in the context of China’s new-type power system, a label co-occurrence matrix was constructed based on predicted labels from three text sources. Using this matrix, a label co-occurrence network was generated, as illustrated in Figure 7. In the network, each node represents a task label, with node size indicating its frequency and edge thickness denoting the frequency of co-occurrence between two labels within the same paragraph. The matrix is symmetric, with diagonal elements indicating independent label frequencies.
From the overall network topology, several high-frequency semantic coupling hubs are evident. Notably, the pair Build system-friendly new energy power stations and Optimize and strengthen the main grid framework exhibits a co-occurrence frequency of 81, forming one of the thickest edges in the graph. This strong structural coupling highlights the interdependence between generation-side planning and transmission backbone design, which together form the foundation for building a source–grid interactive power system.
As shown in Figure 8, Build system-friendly new energy power stations serves as a central node in a dense subnetwork. It has strong semantic ties with Increase the proportion of renewable energy transmission (63 co-occurrences), Apply advanced technologies in transmission channels (35), and Build shared energy storage power stations (24), indicating that this task spans across generation planning, transmission enhancement, and energy storage deployment—suggesting a multi-layered coordination path across core infrastructure.
Another noticeable subnetwork centers around Explore application of new energy storage technologies, which shows substantial co-occurrence with Advance standardization of next-generation coal power (28), Promote application of grid-forming technology (15), and Conduct trial demonstrations of next-generation coal power (28). This reflects a growing integration between energy storage and clean coal-based flexible generation strategies, with energy storage emerging as a pivotal link in supporting green coal-fired transformation.
From a demand-side perspective, Implement coordination between computing power and electricity is frequently coupled with both Build system-friendly new energy power stations (39) and Increase renewable transmission proportion (29), implying that digital infrastructure and computational capacity are becoming integral to real-time scheduling and intelligent grid flexibility under emerging policy directions.
Although some labels exhibit relatively low global frequency, they nonetheless serve as critical semantic bridges. For example, Advance standardization of next-generation coal power co-occurs with Improve the standard system for charging infrastructure (7), Revise distribution grid standards (6), and other foundational tags, suggesting that standardization plays an essential supporting role in bridging diverse policy tasks and enabling multi-label alignment.
In summary, the label co-occurrence network provides a detailed mapping of semantic linkages among policy tasks. It identifies tightly coupled task clusters and reveals latent thematic paths embedded within the policy corpus. These structural insights offer a robust foundation for downstream analyses such as priority evaluation, task cluster identification, and causal reasoning across policy modules.

4.4.3. Evaluation Results of Label Priorities

Table 8 presents the top 20 policy task labels ranked by their PEI, along with their corresponding network centrality scores. Figure 9 visualizes the comparative trends of PEI and centrality using a dual-axis line chart. This analysis integrates both the semantic attention strength (label frequency across text sources) and structural connectivity (position within the co-occurrence network), allowing a comprehensive identification of core and strategic policy tasks.
At the top of the ranking, Build system-friendly new energy power stations (centrality = 0.50, PEI = 0.88) and Optimize and strengthen the main grid framework (0.54, 0.83) dominate both dimensions. The former emerges as the most semantically emphasized task and a central hub in the co-occurrence network, indicating its dual role as a thematic focus and a structural anchor. The latter, although slightly lower in PEI, exhibits the highest centrality among top-tier tasks, implying its critical bridging role across generation, transmission, and load-side clusters.
Labels in the middle range—such as Explore application of new energy storage technologies (0.67, 0.67), Increase the proportion of renewable energy transmission (0.58, 0.66), and Promote application of grid-forming technology (0.58, 0.62)—demonstrate balanced strength in both frequency and structural embedding. These tasks are typically situated at the intersection of multiple policy clusters, connecting upstream planning with downstream implementation.
A close examination of Table 8 reveals a key insight into the model’s behavior and the value of the PEI. The label “Build shared energy storage power stations” exemplifies a critical case: despite its relatively low prediction frequency (71 occurrences), our framework successfully captured its latent strategic importance. This was achieved through structural network analysis, which identified this label as the node with the highest centrality (0.71) in the entire co-occurrence network. This indicates that, while it may not be explicitly mentioned as frequently as other tasks, it is a critical hub conceptually, intimately connected to numerous other core tasks. The PEI index, by integrating both frequency and centrality, effectively compensated for its lower frequency and elevated its ranking to 6th place overall. This demonstrates the unique advantage of our framework—it mitigates the inherent bias of frequency-based analysis by uncovering structurally critical tasks that might otherwise be overlooked, thereby providing a more holistic view of policy priorities.
Similarly, Improve full-process management of distribution grids (PEI = 0.39) and Design intelligent scheduling systems (PEI = 0.47) maintain moderate PEI scores, yet demonstrate strong linkages to higher-ranked tasks such as Grid backbone optimization and Grid-forming technologies, signifying their structural relevance to grid reliability and system intelligence enhancement.
As shown in Figure 9, the centrality line is notably higher than the PEI line for several tasks, such as Shared energy storage, Advanced energy storage technologies, and Computing-power coordination, indicating a strong structural influence despite limited textual emphasis. Conversely, tasks such as Grid-forming technologies and Backbone enhancement maintain top positions on both axes, reinforcing their status as mainstream strategic anchors. At the lower end, labels like Establish dedicated working mechanisms and Enhance integration between electric vehicles and the grid score low on both axes, reflecting their current marginal status in policy attention and interconnectivity.
Overall, the PEI metric effectively captures both the overt and latent influence of policy tasks. By jointly analyzing frequency and centrality, the framework distinguishes between “popular tags” and “structural hubs”.
Scope of inference. The conclusions of this study are drawn from national-level public documents. Application to provincial/municipal and project-level texts is outside the present scope and will be explored in subsequent work; no broader claim of generalization is made here.

5. Conclusions and Future Directions

5.1. Conclusions

This study integrates large-scale policy text mining with multi-label semantic modeling to systematically characterize the structure of policy tasks related to China’s new-type power system. The proposed pipeline delivers reliable results on national-level corpora and turns policy semantics into decision-oriented signals. Broader applicability to local and project-level texts is not claimed here and will be investigated as part of our subsequent work. Focusing on three analytical dimensions—label heat distribution, co-occurrence network structure, and priority evaluation—it yields the following key conclusions:
(1)
Policy tasks exhibit clear differences in expression across text types, reflecting a multi-layered transmission mechanism of “planning orientation to “media reinforcement” to “policy formalization.” For example, the label Build system-friendly new energy power stations appears 107 times in strategic planning documents, 80 times in energy news reports, and only 24 times in official government policies. Similarly, Optimize and strengthen the main grid framework and Increase the proportion of renewable energy transmission appear 80 and 65 times in news sources, compared to just 29 and 17 times in government texts, respectively. These figures suggest that top-level plans focus on forward-looking system deployment, media texts emphasize implementation progress and public dissemination, while official policies adopt more standardized and cautious expressions. The structural distribution of tasks across sources reveals the functional division and coordination inherent in policy communication pathways.
(2)
High-frequency task labels form well-defined co-occurrence structures, building a multi-level task subnetwork covering generation, transmission, storage, regulation, and end-use. As shown in the co-occurrence matrix, Build system-friendly new energy power stations and Optimize and strengthen the main grid framework co-occur 81 times—one of the thickest links in the entire network. This label also frequently co-occurs with Increase the proportion of renewable energy transmission (63 times) and Apply advanced technologies in transmission channels (35 times). These relationships demonstrate that power station deployment, grid optimization, and transmission capacity enhancement are tightly interlinked in policy discourse, jointly forming a source–grid interaction pathway. Meanwhile, low-frequency labels such as Build shared energy storage power stations and Coordinate computing and electricity are connected to multiple high-frequency nodes, indicating their structural importance as embedded components within multi-task policy flows.
(3)
The integrated priority evaluation reveals a dual-layer strategic backbone, composed of “high-frequency & high-connectivity” core tasks and “low-frequency & high-centrality” latent hub tasks. Specifically, Build system-friendly new energy power stations and Optimize and strengthen the main grid framework rank first and second in both PEI scores (0.88 and 0.83) and network centrality (0.50 and 0.54). In contrast, Build shared energy storage power stations holds the highest centrality score (0.71) but a lower PEI value (0.52). This indicates that the former are dominant drivers in current policy deployment, combining semantic salience with structural importance. The latter, while less emphasized textually, functions as a key cross-cutting node in the task network, with the potential to evolve into a “medium- to long-term strategic coordination hub” and should be given increased policy attention.

5.2. Engineering Applications and Strategic Implications

The analytical framework developed in this study is not merely an academic exercise; it provides actionable intelligence and strategic insights for power system planners, policymakers, and grid enterprises. The primary applications include:
Strategic Investment Guidance: The identified high-priority tasks, such as “Optimize and strengthen the main grid framework” and “Build shared energy storage power stations,” offer a data-driven foundation for formulating annual investment plans and long-term development roadmaps. Resource allocation can be prioritized towards these strategically central domains.
Synergistic Project Planning: The label co-occurrence network reveals tightly coupled tasks (e.g., between new energy deployment and grid modernization). This helps identify technical links that require coordinated advancement, preventing “siloed” or isolated construction projects and promoting integrated system development.
Automated Compliance and Management: The multi-label classification model can be deployed to automatically process vast volumes of policy documents, assisting in project compliance review, task extraction, and alignment with national strategic goals, thereby improving operational efficiency.
Furthermore, the macro-strategic priorities identified here provide the essential context and justification for downstream, specific engineering research. The high priority of grid optimization, for instance, directly validates and underscores the necessity of developing advanced protection systems for more complex and interconnected grids, such as the research on relay protection for bipolar LCC HVDC lines [43]. Similarly, the strong coupling between generation, storage, and grid tasks highlights the critical need for enhancing system resilience, which necessitates robust restoration methodologies for multi-energy systems under extreme events, as explored in the study on distributed market-aided restoration for typhoon disasters [44].
In this holistic view, our work provides the top-level strategic mapping derived from policy discourse, while studies like [43,44] represent the critical technological implementations required to achieve these strategic goals. They are complementary components within the complete innovation chain: from policy direction to engineering solution.

5.3. Future Directions

While this study advances label structure modeling and policy task extraction for China’s new-type power system, several limitations remain. First, the policy corpus primarily consists of national-level documents, which may limit the model’s ability to generalize to more granular contexts. The absence of local policy implementations and project-level data is a recognized limitation, as these texts often contain specific details and operational guidelines that differ from top-level design. Second, model training is based on task definitions from the Action Plan for Accelerating the Construction of a New-Type Power System, which, despite its authority, lacks fine-grained label granularity and clear semantic boundaries. Third, the analysis is based on static label distributions, without considering temporal evolution or event-driven dynamics.
Future research may improve the framework in three ways: (1) Data Expansion: Incorporate local government policies, implementation (detailed rules), and project-specific reports to enhance the model’s generalizability and practical utility in diverse regional contexts. Additionally, multilingual and multimodal sources (e.g., charts, structured indicators) could be included. (2) A promising direction for future research is to incorporate a temporal dimension to track the evolution of policy tasks. This could be achieved by segmenting the document corpus into chronological slices (e.g., by year or policy phase), running the model on each time period, and comparing the changes in label heat, co-occurrence network structures, and PEI rankings over time. Going a step further, dynamic graph neural networks or time series analysis methods could be introduced to model the rise, decline, and convergence of policy themes, ultimately revealing the dynamic evolution pathways of China’s new-type power system development. (3) From a modeling perspective, a promising direction involves integrating external knowledge. For instance, constructing a domain knowledge graph for the power system could allow the model to reason with factual relationships between entities (e.g., technologies, policies, projects), potentially leading to more informed and accurate predictions beyond textual patterns.

Author Contributions

Conceptualization, M.Z. and M.L.; methodology, M.Z.; investigation, Y.W.; resources, M.Z.; data curation, M.Z.; writing—original draft preparation, M.Z. and M.L.; writing—review and editing, M.Z.; supervision, H.C.; software, H.C.; validation, H.C.; funding acquisition, H.C.; formal analyses, H.C.; project administration, Y.W., L.L., and Y.Z.; visualization, L.L. and Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Science and Technology Project of State Grid Corporation of China (Grant No. 1400-202456284A-1-1-ZN, Research on the collaborative allocation and feedback control technology of project reservation for business optimization). The authors declare that this study received funding from the State Grid Corporation of China. The funder had the following involvement with the study: Participated in research and provided public data required for research.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Minghong Liu and Lingshuang Liu were employed by the State Grid Xinjiang Electric Power Company Economic and Technological Research Institute; Yan Zhang was employed by the State Grid Xinjiang Electric Power Corporation. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as potential conflicts of interest. The funders had no role in the design of the study; in the collection, analysis, or interpretation of the data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Baseline comparison.
Table A1. Baseline comparison.
ModelMacro-F1Macro-F1Samples-F1
(B1) Linear SVM (TF-IDF)0.7480.7410.752
(B2) RoBERTa-wwm-ext (fine-tuned)0.8040.8120.817
Full-Scale Model0.8310.8490.855

References

  1. IEA. An Energy Sector Roadmap to Carbon Neutrality in China; IEA: Paris, France, 2021. [Google Scholar]
  2. National DARC. The 14th Five-Year Plan for a Modern Energy System; National Development and Reform Commission: Beijing, China, 2022.
  3. National EA. Guiding Opinions on Accelerating the Construction of a New-Type Power System; National Energy Administration: Beijing, China, 2021.
  4. Wang, X.; Huang, L.; Daim, T.; Li, X.; Li, Z. Evaluation of China’s new energy vehicle policy texts with quantitative and qualitative analysis. Technol. Soc. 2021, 67, 101770. [Google Scholar] [CrossRef]
  5. The CCOT; The SC. Opinions on Fully and Accurately Implementing the New Development Philosophy to Achieve Carbon Peak and Carbon Neutrality; General Office of the CPC Central Committee and the State Council: Beijing, China, 2021. [Google Scholar]
  6. Wang, Y.; Chen, X. Development Analysis and Prospective Research for New Type Power Systems. In Proceedings of the 2023 IEEE 7th Conference on Energy Internet and Energy System Integration (EI2), Hangzhou, China, 15–18 December 2023; pp. 2969–2974. [Google Scholar]
  7. Chalkidis, I.; Fergadiotis, M.; Malakasiotis, P.; Aletras, N.; Androutsopoulos, I. LEGAL-BERT: The Muppets straight out of law school. In Findings of the Association for Computational Linguistics: EMNLP 2020; Association for Computational Linguistics: Kerrville, TX, USA, 2020. [Google Scholar]
  8. Nam, J.; Kim, J.; Loza Mencía, E.; Gurevych, I.; Fürnkranz, J. Large-Scale Multi-label Text Classification—Revisiting Neural Networks; Calders, T., Esposito, F., Hüllermeier, E., Meo, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2014; pp. 437–452. [Google Scholar]
  9. Mohammed, H.H.; Dogdu, E.; Görür, A.K.; Choupani, R. Multi-Label Classification of Text Documents Using Deep Learning. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 4681–4689. [Google Scholar]
  10. Liu, Z.; Grau-Bove, J.; Orr, S. BERT-Flow-VAE: A Weakly-supervised Model for Multi-Label Text Classification. In Proceedings of the 29th International Conference on Computational Linguistics, Gyeongju, Republic of Korea, 12–17 October 2022. [Google Scholar]
  11. Schonlau, M.; Weiß, J.; Marquardt, J. Multi-label classification of open-ended questions with BERT. In Proceedings of the 2023 Big Data Meets Survey Science (BigSurv), Quito, Ecuador, 26–29 October 2023; pp. 1–8. [Google Scholar]
  12. Zhang, W.; Jiang, Y.; Fang, Y.; Pan, S. Hierarchical contrastive learning for multi-label text classification. Sci. Rep. 2025, 15, 14101. [Google Scholar] [CrossRef]
  13. Liu, Y.; Xu, F.; Zhao, Y.; Ma, Z.; Wang, T.; Zhang, S.; Tian, Y. Hierarchical multi-instance multi-label learning for Chinese patent text classification. Connect. Sci. 2024, 36, 2295818. [Google Scholar] [CrossRef]
  14. Yarullin, R.; Serdyukov, P. BERT for Sequence-to-Sequence Multi-label Text Classification; van der Aalst, W.M.P., Batagelj, V., Ignatov, D.I., Khachay, M., Koltsova, O., Kutuzov, A., Kuznetsov, S.O., Lomazova, I.A., Loukachevitch, N., Napoli, A., et al., Eds.; Springer International Publishing: Cham, Switzerland, 2021; pp. 187–198. [Google Scholar]
  15. Veeranki, S.P.K.; Abdulnazar, A.; Kramer, D.; Kreuzthaler, M.; Lumenta, D.B. Multi-label text classification via secondary use of large clinical real-world data sets. Sci. Rep. 2024, 14, 26972. [Google Scholar] [CrossRef]
  16. Singh, D. Building a Multi-Label Multi-Class Text Classifier with BERT: A Step-by-Step Guide with Code; Medium: San Francisco, CA, USA, 2024. [Google Scholar]
  17. Yang, B.; Zhang, B.; Cutsforth, K.; Yu, S.; Yu, X. Emerging industry classification based on BERT model. Inf. Syst. 2025, 128, 102484. [Google Scholar] [CrossRef]
  18. Corringham TASD. BERT Classification of Paris Agreement Climate Action Plans. In Proceedings of the Icml 2021 Workshop On Tackling Climate Change with Machine Learning, Virtually, 23 July 2021. [Google Scholar]
  19. Jamshidi, S.; Mohammadi, M.; Bagheri, S.; Najafabadi, H.E.; Rezvanian, A.; Gheisari, M.; Ghaderzadeh, M.; Shahabi, A.S.; Wu, Z. Effective text classification using BERT, MTM LSTM, and DT. Data Knowl. Eng. 2024, 151, 102306. [Google Scholar] [CrossRef]
  20. Bondarenko, I.; Campisi, T.; Tesoriere, G.; Neduzha, L. Using Detailing Concept to Assess Railway Functional Safety. Sustainability 2023, 15, 18. [Google Scholar] [CrossRef]
  21. Dietz, T. Narrowing the US energy efficiency gap. Proc. Natl. Acad. Sci. USA 2010, 107, 16007–16008. [Google Scholar] [CrossRef]
  22. Campbell, H.F.; Brown, R.P.C. Benefit–Cost Analysis: Financial and Economic Appraisal Using Spreadsheets; Cambridge University Press: Cambridge, UK, 2018. [Google Scholar]
  23. Jaffe, A.B.; Stavins, R.N. The energy-efficiency gap What does it mean? Energy Policy 1994, 22, 804–810. [Google Scholar] [CrossRef]
  24. Keppo, I.; Butnar, I.; Bauer, N.; Caspani, M.; Edelenbosch, O.; Emmerling, J.; Fragkos, P.; Guivarch, C.; Harmsen, M.; Lefèvre, J.; et al. Exploring the possibility space: Taking stock of the diverse capabilities and gaps in integrated assessment models. Environ. Res. Lett. 2021, 16, 53006. [Google Scholar] [CrossRef]
  25. Ackerman, F.; DeCanio, S.J.; Howarth, R.B.; Sheeran, K. Limitations of integrated assessment models of climate change. Clim. Change 2009, 95, 297–315. [Google Scholar] [CrossRef]
  26. Sharpe, S. Five Times Faster: Rethinking the Science, Economics, and Diplomacy of Climate Change; Cambridge University Press: Cambridge, UK, 2023. [Google Scholar]
  27. Hughes, N.; Strachan, N. Methodological review of UK and international low carbon scenarios. Energy Policy 2010, 38, 6056–6065. [Google Scholar] [CrossRef]
  28. Bataille, C.; Waisman, H.; Colombier, M.; Segafredo, L.; Williams, J. The Deep Decarbonization Pathways Project (DDPP): Insights and emerging issues. Clim. Policy 2016, 16, S1–S6. [Google Scholar] [CrossRef]
  29. Sovacool, B.K.; Dworkin, M.H. Energy justice: Conceptual insights and practical applications. Appl. Energy 2015, 142, 435–444. [Google Scholar] [CrossRef]
  30. Heffron, R.J.; McCauley, D. The ‘just transition’ threat to our Energy and Climate 2030 targets. Energy Policy 2022, 165, 112949. [Google Scholar] [CrossRef]
  31. Liu, B.; Blekas, K.; Tsoumakas, G. Multi-label sampling based on local label imbalance. Pattern Recognit. 2022, 122, 108294. [Google Scholar] [CrossRef]
  32. Zhang, M.L.; Zhou, Z.H. A Review on Multi-Label Learning Algorithms. IEEE Trans. Knowl. Data Eng. 2014, 26, 1819–1837. [Google Scholar] [CrossRef]
  33. Liao, W.; Wang, Y.; Yin, Y.; Zhang, X.; Ma, P. Improved sequence generation model for multi-label classification via CNN and initialized fully connection. Neurocomputing 2020, 382, 188–195. [Google Scholar] [CrossRef]
  34. Yu, Y.; Zhou, Z.; Zheng, X.; Gou, J.; Ou, W.; Yuan, F. Enhancing Label Correlations in multi-label classification through global-local label specific feature learning to Fill Missing labels. Comput. Electr. Eng. 2024, 113, 109037. [Google Scholar] [CrossRef]
  35. Li, Y.; Shen, J.; Mao, Z. CoocNet: A novel approach to multi-label text classification with improved label co-occurrence modeling. Appl. Intell. 2024, 54, 8702–8718. [Google Scholar] [CrossRef]
  36. Doukas, H.C.; Andreas, B.M.; Psarras, J.E. Multi-criteria decision aid for the formulation of sustainable technological energy priorities using linguistic variables. Eur. J. Oper. Res. 2007, 182, 844–855. [Google Scholar] [CrossRef]
  37. Wang, J.; Jing, Y.; Zhang, C.; Zhao, J. Review on multi-criteria decision analysis aid in sustainable energy decision-making. Renew. Sustain. Energy Rev. 2009, 13, 2263–2278. [Google Scholar] [CrossRef]
  38. Mardani, A.; Zavadskas, E.K.; Khalifah, Z.; Zakuan, N.; Jusoh, A.; Nor, K.M.; Khoshnoudi, M. A review of multi-criteria decision-making applications to solve energy management problems: Two decades from 1995 to 2015. Renew. Sustain. Energy Rev. 2017, 71, 216–256. [Google Scholar] [CrossRef]
  39. Shao, M.; Han, Z.; Sun, J.; Xiao, C.; Zhang, S.; Zhao, Y. A review of multi-criteria decision making applications for renewable energy site selection. Renew. Energy 2020, 157, 377–403. [Google Scholar] [CrossRef]
  40. Meng, F.; An, Q. A new approach for group decision making method with hesitant fuzzy preference relations. Knowl.-Based Syst. 2017, 127, 1–15. [Google Scholar] [CrossRef]
  41. Tijssen, R.J.W. A quantitative assessment of interdisciplinary structures in science and technology: Co-classification analysis of energy research. Res. Policy 1992, 21, 27–44. [Google Scholar] [CrossRef]
  42. Han, B.; Chen, L.; Tian, X. Knowledge based collection selection for distributed information retrieval. Inf. Process. Manag. 2018, 54, 116–128. [Google Scholar] [CrossRef]
  43. Wang, Z.; Hou, H.; Wei, R.; Li, Z. A Distributed Market-Aided Restoration Approach of Multi-Energy Distribution Systems Considering Comprehensive Uncertainties From Typhoon Disaster. IEEE Trans. Smart Grid 2025, 16, 3743–3757. [Google Scholar] [CrossRef]
  44. Tiwari, R.S.; Sharma, J.P.; Gupta, O.H.; Ahmed Abdullah Sufyan, M. Extension of pole differential current based relaying for bipolar LCC HVDC lines. Sci. Rep. 2025, 15, 16142. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Schematic diagram of the research methodology framework.
Figure 1. Schematic diagram of the research methodology framework.
Energies 18 04650 g001
Figure 2. Schematic diagram of the data acquisition and preprocessing workflow.
Figure 2. Schematic diagram of the data acquisition and preprocessing workflow.
Energies 18 04650 g002
Figure 3. Architecture of the improved BERT-based multi-label classification model integrating label.
Figure 3. Architecture of the improved BERT-based multi-label classification model integrating label.
Energies 18 04650 g003
Figure 4. Evolution of validation performance metrics (Macro-F1, Micro-F1, Samples-F1, and Validation Loss) throughout the model training process.
Figure 4. Evolution of validation performance metrics (Macro-F1, Micro-F1, Samples-F1, and Validation Loss) throughout the model training process.
Energies 18 04650 g004
Figure 5. Heatmap visualization of F1 score performance across different ablation configurations.
Figure 5. Heatmap visualization of F1 score performance across different ablation configurations.
Energies 18 04650 g005
Figure 6. Analysis of label distribution characteristics across multiple text sources: (a) proportional distribution; (b) frequency distribution.
Figure 6. Analysis of label distribution characteristics across multiple text sources: (a) proportional distribution; (b) frequency distribution.
Energies 18 04650 g006aEnergies 18 04650 g006b
Figure 7. Label co-occurrence network constructed based on predicted labels from multiple policy text sources.
Figure 7. Label co-occurrence network constructed based on predicted labels from multiple policy text sources.
Energies 18 04650 g007
Figure 8. Subnetwork of high-frequency co-occurrences centered on Build system-friendly new energy power stations.
Figure 8. Subnetwork of high-frequency co-occurrences centered on Build system-friendly new energy power stations.
Energies 18 04650 g008
Figure 9. Comparative trends of network centrality and PEI scores for top-ranked policy task labels.
Figure 9. Comparative trends of network centrality and PEI scores for top-ranked policy task labels.
Energies 18 04650 g009
Table 1. Nine scale method and its meaning.
Table 1. Nine scale method and its meaning.
ParameterSetting Value or MethodFunction Description
Request Methodrequests.get(url, timeout = 10)Initiate a web page request
Web Page ParserBeautifulSoup(response.text, ‘html.parser’)Extract main body content from HTML
Target Tag<p> tagFocus on the central content region
Automatic Retry MechanismRetry 3 times with 2-s intervalsImprove crawling stability
Encoding Recognitionresponse.apparent_encodingAvoid garbled characters
Text SanitizationUnified with PDF extraction modelEnsure format consistency across web and PDF texts
Output FormatExcel (.xlsx)Structure output fields consistent with the PDF extraction module
Table 2. Specifications of the PDF extraction algorithm for technical documents.
Table 2. Specifications of the PDF extraction algorithm for technical documents.
ParameterSetting Value or MethodFunction Description
Extraction LibraryPyPDF2Implement page-by-page text extraction
Page Iteration Methodfor page in reader.pagesIteratively retrieve content from each page
Directory Page Skipping LogicIf page contains Table of Contents or “Chapter X” more than twice → skip pageAutomatically skip front matter pages in policy documents
Main Text Extraction Rulere.search(r‘([-1])’, text)Locate the starting point of the main text; trim forewords and non-main body pages
Whitespace Cleaningre.sub(r‘\s+’, ‘ ’, text)Merge multiple consecutive spaces into a single space
Character Cleaningre.sub(r‘[^ \u4e00-\u9fa5a-zA-Z0-9\s.,;:?!()<>\"\’%\u300A\u300B±+\-]’, ‘ ’, text)Remove garbled and invalid characters; retain valid Chinese, English characters, and punctuation
Text Slice LengthSegment: ≤512 charactersAdapt to BERT input length limit and generate multi-sample training data
Output FormatExcel (.xlsx), fields: text + label (optional)Structured storage for convenient annotation and subsequent model training
Table 3. Specifications of the BERT-driven multi-label classification framework.
Table 3. Specifications of the BERT-driven multi-label classification framework.
ParameterSetting Value or MethodFunction Description
Base Modelbert-base-chineseChinese pretrained model
Maximum Input Length512 charactersEnsure semantic integrity for long policy texts
Label Embedding ModeLabel embedding + Multi-head attention mechanismCapture label-text contextual dependencies
Loss FunctionBCEWithLogitsLoss + pos_weightMitigate label imbalance
Label Weight Setting1/(label_freq)^1.5Inversely proportional weighting to enhance low-frequency labels
Training Epochs10 epochsStabilize model fine-tuning
OptimizerAdamWAvoid overfitting
Learning Rate2 × 10−5Controlled learning step size
Dynamic Threshold StrategyScan range [0.1, 0.6], step size 0.05Improve macro-F1 performance
Top-k CompensationIf all predictions = 0, fill Top-1 labelPrevent empty predictions
Validation PlanTrain/validation split = 8:2Independent evaluation
Evaluation MetricsMacro-F1, Micro-F1, Samples-F1Multi-dimensional performance evaluation
Table 4. Description of the ablation study configurations.
Table 4. Description of the ablation study configurations.
Configuration No.Model Configuration NameModel Feature Description
Config 1Full-Scale ModelLabel embedding + Multi-head attention + Dynamic thresholding + Weighted BCE + Top-k fallback
Config 2Disabling Terminology InjectionRemove add_tokens; model uses only the original pretrained vocabulary to evaluate the impact of domain terminology loss
Config 3Fixed Threshold (0.5)Disable dynamic threshold scanning; apply a fixed threshold of 0.5 for all labels
Config 4Unweighted Loss FunctionSet all label weights equal in the loss function to evaluate the importance of imbalance handling
Config 5Top-k Prediction Strategy OnlyRemove the sigmoid-thresholding structure; retain only the top-k prediction mechanism
Config 6Terminology-Free + Weight-Free + Fixed ThresholdSimultaneously remove domain terminology injection, label weighting, and dynamic thresholding, forming a “degraded baseline model”
Config 7Attention Mechanism DeactivationRemove the label-text attention layer; retain only the [CLS] token pooled output from BERT
Table 5. Macro-F1, Micro-F1, Samples-F1, and validation loss during training.
Table 5. Macro-F1, Micro-F1, Samples-F1, and validation loss during training.
EpochValidation LossMacro F1Micro F1Samples F1
11.360.080.090.09
21.270.110.100.10
31.080.190.180.19
40.850.250.260.30
50.640.370.350.37
60.440.530.530.56
70.340.740.710.74
80.290.790.800.82
90.260.810.810.83
100.250.830.850.86
Table 6. Structural components and validation performance (Macro-F1, Micro-F1, Samples-F1) for each ablation experiment configuration.
Table 6. Structural components and validation performance (Macro-F1, Micro-F1, Samples-F1) for each ablation experiment configuration.
Configuration NameLabel AttentionTerminology InjectionDynamic ThresholdLabel WeightingTop-k CompensationMacro-F1Micro-F1Samples-F1
Config 10.8310.8490.855
Config 2×0.7230.7170.729
Config 3××0.8020.7950.801
Config 4×0.7680.7620.776
Config 5××0.7400.7380.741
Config 6××××0.6980.6900.693
Config 7×0.7550.7520.761
Table 7. Label frequency distribution across different types of policy texts.
Table 7. Label frequency distribution across different types of policy texts.
LabelStrategic Energy Planning DocumentsDigital Energy News PlatformsOfficial Policy Directives on Government Portals
Build system-friendly new energy power stations1078024
Optimize and strengthen the main grid framework488029
Explore application of new energy storage technologies615514
Increase the proportion of renewable energy transmission662517
Apply advanced technologies in transmission channels284011
Build shared energy storage power stations73430
Improve full-process management of distribution grids5169
Implement coordination between computing power and electricity47142
Improve the standard system for charging infrastructure212615
Promote application of grid-forming technology26295
Conduct trial demonstrations of next-generation coal power4693
Design intelligent scheduling systems3484
Establish dedicated working mechanisms 19617
Develop virtual power plants25132
Enhance integration between electric vehicles and the grid21127
Revise distribution grid standards12139
Upgrade grid-connected performance of new energy entities21715
Advance standardization of next-generation coal power111011
Implement high-penetration demand-side response in pilot regions2055
Innovate scheduling modes for active distribution networks10171
Table 8. Centrality and PEI scores for the top 20 policy task labels.
Table 8. Centrality and PEI scores for the top 20 policy task labels.
LabelCentralityPEI
Build system-friendly new energy power stations0.500.88
Optimize and strengthen the main grid framework0.540.83
Explore application of new energy storage technologies0.670.67
Increase the proportion of renewable energy transmission0.580.66
Promote application of grid-forming technology0.580.62
Build shared energy storage power stations0.710.52
Apply advanced technologies in transmission channels0.580.48
Design intelligent scheduling systems0.580.47
Improve full-process management of distribution grids0.500.39
Improve the standard system for charging infrastructure0.420.36
Develop virtual power plants0.380.35
Implement coordination between computing power and electricity0.460.34
Conduct trial demonstrations of next-generation coal power0.380.24
Implement high-penetration demand-side response in pilot regions0.380.23
Upgrade grid-connected performance of new energy entities0.290.18
Advance standardization of next-generation coal power0.210.13
Revise distribution grid standards0.210.13
Innovate scheduling modes for active distribution networks0.210.13
Enhance integration between electric vehicles and the grid0.210.12
Establish dedicated working mechanisms0.170.10
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhou, M.; Chen, H.; Liu, M.; Wang, Y.; Liu, L.; Zhang, Y. Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification. Energies 2025, 18, 4650. https://doi.org/10.3390/en18174650

AMA Style

Zhou M, Chen H, Liu M, Wang Y, Liu L, Zhang Y. Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification. Energies. 2025; 18(17):4650. https://doi.org/10.3390/en18174650

Chicago/Turabian Style

Zhou, Mingyuan, Heng Chen, Minghong Liu, Yinan Wang, Lingshuang Liu, and Yan Zhang. 2025. "Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification" Energies 18, no. 17: 4650. https://doi.org/10.3390/en18174650

APA Style

Zhou, M., Chen, H., Liu, M., Wang, Y., Liu, L., & Zhang, Y. (2025). Development Analysis of China’s New-Type Power System Based on Governmental and Media Texts via Multi-Label BERT Classification. Energies, 18(17), 4650. https://doi.org/10.3390/en18174650

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop