Next Article in Journal
All for One or One for All? A Comparative Study of Grouped Data in Mixed-Effects Additive Bayesian Networks
Previous Article in Journal
Feedback Linearization of a Reduced Chemostat Model Under Inflow Disturbances
Previous Article in Special Issue
Artificial Intelligence Applied to Soil Compaction Control for the Light Dynamic Penetrometer Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy

1
School of Business and Management, Jilin University, Changchun 130012, China
2
Scientific Research Institute, Changchun Finance College, Changchun 130022, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(22), 3648; https://doi.org/10.3390/math13223648
Submission received: 14 October 2025 / Revised: 4 November 2025 / Accepted: 10 November 2025 / Published: 14 November 2025
(This article belongs to the Special Issue Artificial Intelligence and Data Science, 2nd Edition)

Abstract

Data-driven text classification technology is crucial for understanding and managing a large number of entrepreneurial policy-related texts, yet it is hindered by two primary challenges. First, the intricate, multi-faceted nature of policy documents often leads to insufficient information extraction, as existing models struggle to synergistically leverage diverse information types, such as statistical regularities, linguistic structures, and external factual knowledge, resulting in semantic sparsity. Second, the performance of state-of-the-art deep learning models is heavily reliant on large-scale annotated data, a resource that is scarce and costly to acquire in entrepreneurial policy domains, rendering models susceptible to overfitting and poor generalization. To address these challenges, this paper proposes a Multi-granularity Invariant Structure Learning (MISL) model. Specifically, MISL first employs a multi-view feature engineering module that constructs and fuses distinct statistical, linguistic, and knowledge graphs to generate a comprehensive and rich semantic representation, thereby alleviating semantic sparsity. Furthermore, to enhance robustness and generalization from limited data, we introduce a dual invariant structure learning framework. This framework operates at two levels: (1) sample-invariant representation learning uses data augmentation and mutual information maximization to learn the essential semantic core of a text, invariant to superficial perturbations; (2) neighborhood-invariant semantic learning applies a contrastive objective on a nearest-neighbor graph to enforce intra-class compactness and inter-class separability in the feature space. Extensive experiments demonstrate that our proposed MISL model significantly outperforms state-of-the-art baselines, proving its effectiveness and robustness for classifying complex texts in entrepreneurial policy domains.

1. Introduction

In an increasingly innovation-driven global economy, governments worldwide have enacted a vast number of policies to foster entrepreneurial ecosystems, aiming to promote economic growth and employment [1,2,3,4,5]. These policies, ranging from macro-level regulations to specific financial subsidies and tax incentives, have generated a massive volume of complex and fragmented policy texts. This information explosion presents significant challenges for policymakers, researchers, and entrepreneurs, including information overload, difficulties in policy evaluation and monitoring, and a lack of transparency [6,7]. To address these challenges, the application of Artificial Intelligence (AI) and Natural Language Processing (NLP) for entrepreneurship policy text classification has emerged [8,9]. This technology automates the process of categorizing immense volumes of unstructured texts, such as policy documents, evaluation reports, press releases, and public feedback, into predefined categories, for instance, by policy instrument, target group, or policy objective. This greatly enhances the efficiency and accuracy of policy analysis. It not only provides data-driven support for evidence-based policy evaluation and optimization but also increases policy transparency. By enabling entrepreneurs and research institutions to more easily access and understand relevant information, it offers crucial technical support for building a healthy and efficient entrepreneurial ecosystem [10,11].
Current data-driven text classification methods are primarily divided into two main categories [12,13,14,15]. The first is based on traditional machine learning, which represents a classic paradigm in the field [16]. Its core process involves transforming unstructured text into machine-readable numerical vectors through handcrafted feature engineering [17]. This approach typically represents text using either the Bag-of-Words model, which treats a document as an unordered collection of word frequencies often weighted by the TF-IDF algorithm, or the N-gram model, which preserves local word order by capturing sequences of consecutive words. Once the text is vectorized, these feature vectors are fed into supervised learning classifiers such as Naive Bayes, Support Vector Machine, or Logistic Regression for training, enabling the model to learn the mapping between text and its category [18,19]. The effectiveness of this class of methods is heavily dependent on the quality of the feature engineering [20].
The second category is composed of methods based on deep learning, which represent the state of the art in text classification technology [21,22]. Their primary advantage lies in the ability to automatically extract high-level semantic features from text through an end-to-end learning process, thus circumventing the need for cumbersome and experience-dependent handcrafted feature engineering [23,24]. These methods begin by mapping words into low-dimensional, dense vectors using word embedding techniques. Subsequently, different neural network architectures are employed to capture textual features: Convolutional neural networks efficiently extract key local phrase information through convolution and pooling operations [25]; long short-term memory networks, a variant of recurrent neural networks, excel at processing text sequences to capture long-range contextual dependencies via their gating mechanisms [26]; and architectures represented by the Transformer model leverage a core self-attention mechanism to process text in parallel, directly calculating the relational strength between words to build exceptionally powerful contextual representations, achieving breakthrough results in virtually all text classification tasks [27,28,29,30].
Despite the superior performance of text classification methods, two critical issues remain, limiting their practical application in the entrepreneurship policy domain. First, the intricate and multi-faceted nature of policy texts often leads to insufficient information extraction. Existing models frequently struggle to synergistically leverage the diverse types of information embedded in the text—such as statistical regularities, linguistic structures, and connections to external factual knowledge—resulting in a superficial semantic understanding and failing to overcome the semantic sparsity prevalent in this field. Second, the performance of state-of-the-art deep learning models is heavily reliant on large-scale annotated datasets for training. In specialized domains like entrepreneurship policy, acquiring such labeled data is a major bottleneck, as it is both labor-intensive and requires domain expertise. This dependency on limited labeled data renders models highly susceptible to overfitting, which severely impairs their ability to generalize to new and unseen policy documents in real-world applications.
To this end, a Multi-granularity Invariant Structure Learning (MISL) model is proposed for text classification in entrepreneurship policy. Specifically, our approach comprises three core components designed to systematically address the aforementioned challenges. First, we introduce a multi-view feature engineering module to tackle the issue of insufficient information extraction. This module constructs three distinct graphs to capture semantics from different perspectives: a statistical graph to model word co-occurrence patterns, a linguistic graph to capture syntactic roles and resolve ambiguity, and a knowledge graph to integrate external factual information. By fusing these heterogeneous views, the model generates a comprehensive and rich representation of the policy text, effectively alleviating semantic sparsity. Second, to mitigate the problem of overfitting on limited labeled data, we employ a sample-invariant representation learning strategy. This is achieved through a data augmentation process where we generate varied instances of the original text. The model is then trained to maximize the mutual information between the representations of the original text and its augmented versions. This forces the model to learn the essential and stable semantic core of the text, thereby enhancing its robustness and generalization capabilities. Finally, the model further refines these representations through neighborhood-invariant semantic learning. By constructing a nearest-neighbor graph and applying a contrastive objective, this component encourages intra-class compactness and inter-class separability in the feature space. It pulls semantically similar policy texts closer together while pushing dissimilar ones apart, which significantly improves the discriminative power of the final representations for more accurate classification.
The main contributions are threefold:
  • We propose a novel multi-view feature engineering module that integrates statistical, linguistic, and knowledge-based representations through distinct graph structures. This approach effectively alleviates semantic sparsity in policy texts by capturing comprehensive contextual information from multiple perspectives.
  • We introduce a sample-invariant representation learning strategy that leverages data augmentation and mutual information maximization. This method enhances the model’s robustness and generalization capabilities by focusing on the essential and stable semantic core of the text, even with limited labeled data.
  • We develop a neighborhood-invariant semantic learning component that constructs a nearest-neighbor graph and applies a contrastive objective. This technique improves the discriminative power of the final representations by promoting intra-class compactness and inter-class separability, leading to more accurate text classification.

2. The Proposed Method

Given a text document D composed of a sequence of L tokens, where the document belongs to one of C predefined categories, the goal of the text classification task aims to learn a mapping function f ( · ) to predict the corresponding category label y { 1 , 2 , , C } , i.e.,  f : D y . To achieve the above objective, this paper proposes a multi-granularity invariant structure learning method (MISL) for mining short text patterns, which contains the multi-view feature engineering, the sample-invariant representation learning, and the neighborhood-invariant semantic learning, as shown in Figure 1.

2.1. Multi-View Feature Engineering

Multi-view feature engineering aims to synergistically leverage the intrinsic statistical regularities and linguistic structures of the text as well as the factual knowledge provided by external knowledge bases, to alleviate semantic sparsity prevalent in texts. To achieve this objective, we design and construct three distinct graph structures to capture and model each of these information types, thereby generating a richer contextual representation for short texts.
Statistical representation. The statistical regularities of a text reflect the distributional properties and co-occurrence patterns of its vocabulary. To this end, we construct a statistical graph G s = { V s , X s , A s } to explicitly model this type of information. In this graph, V s represents the set of word nodes. The connection strength between nodes is defined by an adjacency matrix A s R | V s | × | V s | , where weights are calculated using Pointwise Mutual Information (PMI) to quantify the co-occurrence probability of word pairs:
A s , i j = max ( PMI ( v i , v j ) , 0 ) , v i , v j V s
For node features X s , we use GloVe word vectors for initialization. Then, we input the statistical graph in a two-layer graph convolutional networks G C N ( · ) to learn statistical representations Z s :
Z s = G C N ( G C N ( X s , A s ) , A s )
Linguistic representation. Linguistic structure, particularly syntactic information, is crucial for accurately interpreting the meaning of a text. We focus on utilizing Part-of-Speech (POS) tagging to parse the syntactic roles of words (e.g., distinguishing adjectives from adverbs), thereby resolving lexical ambiguity. To this end, we construct a linguistic graph G l = { V l , X l , A l } , where V l is the set of nodes for all unique POS tags. The adjacency matrix A l is also computed based on PMI to model the co-occurrence patterns of different POS tags:
A p , i j = max ( PMI ( v i , v j ) , 0 ) , v i , v j V l
The initial feature for each POS node, X l , is set as a one-hot vector. Similar to the statistical representation, we apply a two-layer graph convolutional networks to generate linguistic representation Z l .
Z l = G C N ( G C N ( X l , A l ) , A l )
Knowledge representation. External factual knowledge can supplement short texts with necessary background information, significantly enhancing the performance of classification models. To incorporate this knowledge, we first identify and link entities from the text to a knowledge graph, and then construct a knowledge graph G k = { V k , X k , A k } . In the implementation, we use the TAGME toolkit to perform entity linking against the NELL knowledge base, where V k is the resulting set of entity nodes. The initial embeddings for these entities, X k , are generated using the TransE model, which is adept at learning entity relationships in a knowledge graph. The adjacency matrix A k is derived from the cosine similarity between entity embedding pairs to capture semantic proximity:
A k , i j = max ( cos ( X k , i , X k , j ) , 0 )
Likewise, we apply a two-layer graph convolutional networks to generate linguistic representation Z k .
Z k = G C N ( G C N ( X k , A k ) , A k )
Based on the above three representations, a multi-view fusion function F u s i o n ( · ) is designed to aggregate complementary information:
Z = F u s i o n ( Z s , Z l , Z k )
where Z denotes the fusion representation of the text. In the experiments, F u s i o n ( · ) is implemented as the concatenation operations of three representations.

2.2. Sample-Invariant Representation Learning

Sample-invariant representation learning aims to utilize invariant structure hidden in each text to boost the discriminative ability of representations. To accomplish this, we introduce the data augmentation strategy to generate augmentation text and then maximize mutual information between the augmentation text and original text to endow representations with sample-invariant structures.
Specifically, we define a text data augmentation strategy ( A u g ) to generate augmentation text:
A u g a ( D , α ) , A u g b ( D , β ) , A u g c ( D , γ ) A u g D ¯ = A u g j ( D ) , j { a , b , c }
where A u g a , A u g b , and  A u g c denote three augmentation methods, i.e., back translation, random deletion, and synonym replacement. α , β , and  γ correspond to the back translation rate, random deletion rate, and synonym replacement rate, respectively. D ¯ denotes the text augmentation of the text D. Then, through the multi-view feature engineering, the augmentation fusion representation Z ¯ is obtained.
In general, data augmentation should not alter the semantic information of the text. That is, the original text and augmentation text should have consistent semantics. Thus, the mutual information between original fusion representations and augmentation text is maximized to achieve the above objective:
m a x I ( Z , D ¯ )
where I ( , ) denotes the mutual information function. Based on the definition of mutual information, the following holds:
I ( Z , D ¯ ) = p ( Z D ¯ ) p ( D ¯ ) log p ( Z D ¯ ) p ( Z ) d D ¯ d Z = KL ( p ( Z D ¯ ) p ( D ¯ ) p ( Z ) p ( D ¯ ) )
where p ( D ¯ ) denotes the marginal distribution of the augmented text, p ( Z ) is the marginal distribution of the representations, and  p ( Z D ¯ ) is the conditional distribution of Z given D ¯ . KL ( · ) denotes the Kullback–Leibler divergence.
However, the direct optimization of a KL divergence-based objective presents a significant challenge in the context of deep learning. The unbounded nature of the KL divergence can lead to unstable training dynamics, where the loss can grow indefinitely, causing exploding gradients. To mitigate this, we seek a more robust and well-behaved alternative. The Jensen–Shannon (JS) divergence is an ideal candidate because it is both symmetric and, crucially, bounded within the range [ 0 , log 2 ] . By substituting JS divergence for KL divergence, we reformulate our objective as follows:
m a x I ( Z , D ¯ ) = JS ( p ( Z D ¯ ) p ( D ¯ ) p ( Z ) p ( D ¯ ) )
While theoretically appealing, the direct computation of JS divergence is often intractable, as it requires knowledge of the distribution p ( Z ) , which is typically unknown. To overcome this, we employ a variational estimation approach. The variational lower bound of the JS divergence between two distributions, p ( X ) and q ( X ) , is defined as:
JS ( p ( X ) q ( X ) ) sup T ( E x p ( X ) [ log σ ( T ( x ) ) ] + E x q ( X ) [ log ( 1 σ ( T ( x ) ) ) ] )
where T ( · ) is a function, typically a neural network, that acts as a discriminator, and  σ ( · ) is the sigmoid function. E denotes the expectation operator. This framework effectively reframes the problem of density ratio estimation as a more tractable binary classification task. Then, we replace p ( X ) with p ( Z D ¯ ) p ( D ¯ ) and q ( X ) with p ( Z ) p ( D ¯ ) , and we obtain:
m a x I ( Z , D ¯ ) = E ( D ¯ , Z ) p ( Z | D ¯ ) p ( D ¯ ) [ log σ ( T ( D ¯ , Z ) ) ] + E ( D ¯ , Z ) p ( Z ) p ( D ¯ ) [ log ( 1 σ ( T ( D ¯ , Z ) ) ) ] )
In practice, the expectation over the product of marginals is approximated using negative sampling. We recast an augmentation text D ¯ and fusion representations Z of the corresponding original text D as a positive pair. We recast the same augmentation text D ¯ and fusion representations Z of the other original text. By training the discriminator to distinguish these pairs, the representation learning model is implicitly forced to maximize the mutual information.
Finally, the loss L s p l of sample-invariant representation learning is defined as:
L s p l = ( m a x I ( Z , D ¯ ) + I ( Z ¯ , D ) )

2.3. Neighborhood-Invariant Semantic Learning

Neighborhood-invariant semantic learning aims to bring similar samples with high confidence closer together and keep dissimilar samples with high confidence away to capture text patterns with intra-class compactness and inter-class separability.
Specifically, given fusion representations, we introduce the nearest-neighbor graph G n via utilizing k-nearest neighbors to select the top k most similar neighbors for each text sample D:
G n , i j = 1 ( D i φ ( D j ) or D j φ ( D i ) ) 0 otherwise ,
where φ ( D j ) denotes the nearest set of the j-th text sample D j . G n , i j denotes the element of the nearest-neighbor graph G n .
Following the initial identification of nearest neighbors for every sample, a naive method might consider this entire neighbor set as the basis for forming positive pairs. Such a strategy, however, risks contaminating the learning signal with semantically inconsistent pairings, as some neighbors may be incidental rather than truly related. To address this potential for noise, we implement a robust filtering mechanism that consults the pre-computed, stable class assignments. Our model mandates that for a pair of samples to be considered positive, they must simultaneously fulfill two requirements: a local relationship (being nearest neighbors) and a global one (residing in the same class). In contrast, pairs are defined as negative when they exhibit neither a local nor a global semantic connection. By imposing this strict, two-tiered validation, we effectively purify the set of positive and negative pairs, ensuring a more accurate and stable optimization process.
Based on the above analysis, for the nearest-neighbor graph G n , we identify its disjoint subgraphs as different classes using the connected component labeling algorithm. This algorithm efficiently partitions the set of nodes into components { C 1 , C 2 , , C K } , where any two nodes within a component C k are connected by a path. We treat each component as a pseudo-label, assuming that all texts within it are semantically similar. Let C ( D i ) be the function that returns the component ID for text D i . From this, we derive a binary pseudo-label matrix P { 0 , 1 } N × N between N text samples:
P i j = 1 if C ( D i ) = C ( D j ) 0 otherwise
Here, P i j = 1 signifies that texts D i and D j belong to the same semantic component. Based on neighborhood structure information and class semantic information, we construct the reliable positive pairs set R i p and negative pairs set R i n as follows:
R i p = { j G n , i j = 1 , p i j = 1 , j [ 1 , N ] } ,
R i n = { j G n , i j = 0 , p i j = 0 , j [ 1 , N ] } ,
Then, the loss L n s l of the neighborhood-invariant semantic learning is defined as:
L n s l = log j R i p exp ( s i m ( z i , z ¯ j ) ) j R i n exp ( s i m ( z i , z ¯ j ) ) + j R i p exp ( s i m ( z i , z j ) )
where exp(·) and s i m ( · ) are the exponential function and the cosine function, respectively.

2.4. The Overall Loss Function

The model is trained end-to-end by optimizing a composite objective that linearly combines the neighborhood-invariant semantic learning loss, the sample-invariant representation learning loss, and the hard-aware cross entropy loss, as follows:
L = L h c e + λ L n s l + δ L s r l
where λ and δ are trade-off parameters. L h c e is the hard-aware cross entropy loss:
L h c e = 1 N i = 1 N ( 1 q i , y i ) τ log ( q i , y i )
where q i , y i is the predicted probability for the true class y i of the i-th text sample. The core idea is to dynamically scale the cross-entropy loss with a weighting factor τ that diminishes the contribution of easy samples and, in turn, magnifies the focus on hard-to-classify samples. q i can be obtained via inputting the fusion representation Z i into a linear classifier. The detailed training process is shown in Algorithm 1.
Algorithm 1  Algorithm for Multi-granularity Invariant Structure Learning (MISL)
  1:
/* Training Phase */
  2:
Input: Training dataset D = { ( D i , y i ) } i = 1 N ; the category number C; loss weights λ , δ .
  3:
Initialize: Initialize the parameters θ of MISL.
  4:
while not converged do
  5:
   Sample a batch of text documents { ( D i , y i ) } from D.
  6:
   for each document D i in the batch do
  7:
      Construct statistical graph G s , linguistic graph G l , and knowledge graph G k .
  8:
      Obtain view-specific representations Z s , Z l , Z k .
  9:
      Fuse representations to get the final representation Z i = Fusion ( Z s , Z l , Z k ) .
10:
      Generate an augmented view D ¯ i = Aug ( D i ) .
11:
      Obtain the augmentation representation Z ¯ i using the same process as lines 7–9.
12:
   end for
13:
   Calculate the hard-aware cross-entropy loss L h c e .
14:
   Calculate the sample-invariant representation learning loss L s p l .
15:
   calculate the neighborhood-invariant semantic learning L n s l .
16:
   Compute the total loss: L = L h c e + λ L n s l + δ L s p l .
17:
   Update model parameters θ using gradient descent.
18:
end while
19:
Output: The model parameter θ .
20:
/* Inference Phase */
21:
Input: A test document D t e s t .
22:
Apply the trained model with parameter θ on D t e s t to obtain the corresponding label predictions.
23:
Output: Label predictions of D t e s t .

3. Experimental Evaluation

3.1. Setup

Datasets. Three text datasets are leveraged to test the performance of MISL, i.e., Ohsumed, TagMyNews, and Snippets. Following current works, a small pool of 40 labeled documents is randomly selected for each class, and then this pool is evenly partitioned, yielding 20 documents per class for the training set and another 20 for the validation set. The majority of the remaining data is designated as the test set. Specifically, the Ohsumed dataset includes 7400 documents, with 460 for training (6.22%), 11,764 words, 4507 entities, 38 tags, 23 classes. The TagMyNews dataset is the most extensive, with 32,549 documents, 140 for training (0.43%), 38,629 words, 14,734 entities, 42 tags, and 7 classes. The Snippets dataset is larger, with 12,340 documents, 160 for training (1.30%), 29,040 words, 9737 entities, 34 tags, and it covers 8 classes.
Evaluation Metrics: Following [31,32,33], to evaluate the performance of MISL, Accuracy (ACC), Recall (Rec), Precision (Pre), and F1 are leveraged in the experiments. The larger the values of the four metrics, the better the performance. The experiment results are the average of five times.
Comparison methods. Eleven text classification methods are selected as baselines, including QSIM [7], EMGAN [10], GTC [13], PTE [17], LSTMNN [19], NFS [20], Hy-TC [22], IEG-GAT [24], IGCL [30], BERT-avg [34] and BERT-cls [34].
Implementation details. In the experiments, the Adam optimizer is used to optimize the overall network with a learning rate of 0.0003 and an epoch number of 500. λ and δ are determined by conducting grid search experiments on three datasets. An early stopping mechanism was employed to prevent overfitting. The training process was terminated if the loss on the validation set did not decrease for 10 consecutive epochs, and the model weights from the best-performing epoch were restored for the final evaluation. For Ohsumed, TagMyNews, and Snippets, λ is set to 1, 1, and 0.1, respectively. δ is set to 1, 1, and 1, respectively.

3.2. Comparison Evaluation

The results of the comparison evaluation, as detailed in Table 1, Table 2 and Table 3, demonstrate the superior performance of the proposed MISL model across all three datasets: Ohsumed, TagMyNews, and Snippets. In all evaluated metrics, our model consistently outperforms eleven baseline methods, often by a significant margin. These consistent and substantial gains across diverse datasets underscore the robustness and effectiveness of the proposed approach.
The remarkable performance of our proposed model can be attributed to several key methodological innovations designed to address the specific challenges of text classification in entrepreneurship policy. Firstly, the novel multi-view feature engineering module effectively mitigates semantic sparsity by integrating statistical, linguistic, and knowledge-based representations. This holistic approach captures a more comprehensive contextual understanding of the policy texts. Secondly, the sample-invariant representation learning strategy, which leverages data augmentation and mutual information maximization, enhances the model’s robustness and generalization capabilities, particularly in scenarios with limited labeled data. By forcing the model to learn the essential semantic core of the text, it becomes less susceptible to superficial variations. Lastly, the neighborhood-invariant semantic learning component improves the discriminative power of the final representations by promoting intra-class compactness and inter-class separability, leading to more accurate classification.
Meanwhile, to validate the statistical significance of our results, we performed paired t-tests between our proposed MISL model and each baseline method for every evaluation metric across all datasets. The tests were conducted using the results from five independent runs with different random seeds. Significance levels are indicated in the tables: ** denotes p < 0.01 when comparing baseline methods against our proposed model, confirming that all observed improvements are statistically meaningful.

3.3. Ablation Analysis

An ablation study is conducted to validate the contribution of each representation within our multi-view feature engineering module: statistical ( Z s ), linguistic ( Z l ), and knowledge-based ( Z k ). The results in Table 4 consistently demonstrate the overall effectiveness of the multi-view approach, as the complete model incorporating all three representations achieved the best performance across all datasets, highlighting a synergistic effect where each view provides complementary and indispensable information. Notably, the removal of the knowledge-based representation resulted in the most substantial decline in performance, underscoring its critical role in resolving semantic sparsity by enriching the text with crucial background and contextual information from external knowledge bases. Eliminating the statistical representation also led to a significant performance degradation, confirming that information derived from word co-occurrence patterns serves as the fundamental building block for capturing the core topics of the text. Finally, while the removal of the linguistic representation had a comparatively lesser impact, it still caused a noticeable decrease in performance, suggesting its value as an auxiliary source of fine-grained semantic clues. In summary, the analysis reveals a clear hierarchy of importance for the representations and confirms that their effective integration is key to the model’s superior performance.
Meanwhile, to assess the contribution of each component within our composite objective, we performed an ablation study on the loss functions, with results presented in Table 5. The results unequivocally establish the hard-aware cross-entropy loss ( L h c e ) as the foundational component; its removal led to a catastrophic collapse in performance, as it provides the primary supervised signal essential for the classification task. Furthermore, both the neighborhood-invariant ( L n s l ) and sample-invariant ( L s r l ) learning losses proved to be critically important. Removing the neighborhood-invariant loss significantly weakened the discriminative power of the feature space, while removing the sample-invariant loss diminished the model’s robustness and ability to generalize by learning the core semantics of the text. The superior performance of the complete model over any of its ablated variants strongly demonstrates a powerful synergy, confirming that the integration of a core classification objective with these two distinct representation learning objectives is essential for achieving a highly accurate and robust final model.

3.4. Parameter Analysis

λ and δ act as weighting factors, controlling the influence of the neighborhood-invariant and sample-invariant losses, respectively. To determine the optimal values for these hyperparameters, a comprehensive grid search is conducted, which explores a range of values for both λ and δ , specifically from the set { 100 , 10 , 1 , 0.1 , 0.01 } . This process allows for a thorough evaluation of how different balances between the loss components affect the model’s ability to classify text accurately. The results in Figure 2 demonstrate a consistent and telling pattern across all three datasets: Ohsumed, TagMyNews, and Snippets. The model’s performance, measured by ACC and F1, consistently peak when the values for λ and δ were set within the range of [ 0.1 , 1 ] . This phenomenon indicates that the MISL model is most effective when the contributions of the neighborhood-invariant and sample-invariant learning components are carefully balanced with the primary cross-entropy loss. When the weights are too high (e.g., 10 or 100), these specialized loss components may dominate the learning process, potentially overshadowing the fundamental classification task. Conversely, when the weights are too low (e.g., 0.01), their regularizing and structure-enforcing benefits are diminished, leading to suboptimal performance.

3.5. Fusion Analysis

To determine the most effective method for integrating the multi-view representations, we evaluate four distinct fusion operations: concatenation, average, weighting, and sum. The ACC of each method was tested on the Ohsumed, TagMyNews, and Snippets datasets. The results in Figure 3 consistently show that the concatenate and weighting operations are highly competitive, both significantly outperforming the average and sum methods across all three datasets. While the weighting mechanism yields strong results, it introduces additional learnable parameters and computational overhead. Given that the concatenate operation achieves comparable or superior performance without this extra burden, it was selected as the fusion strategy for our model due to its optimal balance of effectiveness and efficiency.

3.6. Model Analysis

Based on the comprehensive model analysis in Table 6, the proposed MISL framework demonstrates significant advantages in computational efficiency and practical deployment potential. The model achieves exceptional parameter efficiency, requiring substantially fewer parameters while maintaining competitive performance. The results conclusively show that our method successfully navigates the trade-off between model complexity and operational efficiency, offering a practical and effective approach for text classification tasks.

4. Conclusions

This paper addresses two core challenges in the text classification of entrepreneurship policy: the issue of semantic sparsity arising from the intrinsic complexity of policy texts, and the problems of model overfitting and poor generalization caused by the scarcity of labeled data in this specialized domain. To overcome these challenges, we propose an innovative multi-granularity invariant structure learning model. To combat semantic sparsity, the model first introduces a multi-view feature engineering module. This module constructs and fuses three distinct graph structures—statistical, linguistic, and knowledge-based—to capture textual information from multiple dimensions, thereby generating a comprehensive and rich semantic representation. To resolve the issue of model robustness in the context of data scarcity, we introduce a dual invariant structure learning framework. This framework operates on two levels: first, a sample-invariant representation learning that uses data augmentation and mutual information maximization to help the model learn the essential and stable semantic core of a text, making it insensitive to superficial perturbations. Second, a neighborhood-invariant semantic learning that applies a contrastive objective on a nearest-neighbor graph to enhance intra-class compactness and inter-class separability in the feature space. In summary, through its meticulously designed multi-view feature fusion and dual invariant learning mechanisms, the MISL model systematically solves the key difficulties in entrepreneurship policy text classification. It provides a powerful solution for achieving efficient, robust, and accurate classification of complex texts, even with limited data. While MISL demonstrates strong performance, we acknowledge several limitations that present opportunities for future work. First, the model’s knowledge representation module relies on external knowledge bases. Consequently, its effectiveness can be constrained by the coverage and quality of these bases, potentially leading to information loss for entities or concepts not well-represented in them. Second, the construction of multiple graph views and the subsequent graph convolutional operations introduce higher computational complexity compared to simpler text classification models, which may limit its application in scenarios requiring strict real-time processing.

Author Contributions

Conceptualization, X.S.; methodology, X.S. and M.Y.; formal analysis, X.S. and M.Y.; writing—original draft preparation, X.S.; writing—review and editing, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by 1. National Natural Science Foundation of China (Grant No. 72091313); 2. Changchun Philosophy and Social Sciences Planning Project (Grant No. CSKT2024ZX—107).

Data Availability Statement

The data presented in this study are openly available on Github at https://github.com/Machinelearning20/MISL, accessed on 1 November 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Allam, H.; Makubvure, L.; Gyamfi, B.; Graham, K.N.; Akinwolere, K. Text classification: How machine learning is revolutionizing text categorization. Information 2025, 16, 130. [Google Scholar] [CrossRef]
  2. Zhang, D.W.; Mi, R.X.; Zhou, P.Y.; Jin, D.; Zhang, M.; Song, T. Optimization Method for Imbalanced Text Classification Based on Large Models. Softw. Eng. 2025, 28, 47–50+78. [Google Scholar] [CrossRef]
  3. Wu, Y.; Wan, J. A survey of text classification based on pre-trained language model. Neurocomputing 2025, 616, 128921. [Google Scholar] [CrossRef]
  4. Gao, N.; Wang, Y.; Chen, P.; Zheng, X. Graph-Guided Multi-view Text Classification: Advanced Solutions for Fast Inference. In International Conference on Artificial Neural Networks; Springer: Cham, Switzerland, 2024; pp. 126–142. [Google Scholar]
  5. Gao, J.; Liu, M.; Li, P.; Zhang, J.; Chen, Z. Deep Multiview Adaptive Clustering with Semantic Invariance. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 12965–12978. [Google Scholar] [CrossRef] [PubMed]
  6. Gao, J.; Liu, M.; Li, P.; Laghari, A.A.; Javed, A.R.; Victor, N.; Gadekallu, T.R. Deep Incomplete Multiview Clustering via Information Bottleneck for Pattern Mining of Data in Extreme-environment IoT. IEEE Internet Things J. 2023, 11, 26700–26712. [Google Scholar] [CrossRef]
  7. Gao, H.; Zhang, P.; Zhang, J.; Yang, C. Qsim: A quantum-inspired hierarchical semantic interaction model for text classification. Neurocomputing 2025, 611, 128658. [Google Scholar] [CrossRef]
  8. Reusens, M.; Stevens, A.; Tonglet, J.; De Smedt, J.; Verbeke, W.; Vanden Broucke, S.; Baesens, B. Evaluating text classification: A benchmark study. Expert Syst. Appl. 2024, 254, 124302. [Google Scholar] [CrossRef]
  9. Jamshidi, S.; Mohammadi, M.; Bagheri, S.; Najafabadi, H.E.; Rezvanian, A.; Gheisari, M.; Ghaderzadeh, M.; Shahabi, A.S.; Wu, Z. Effective text classification using BERT, MTM LSTM, and DT. Data Knowl. Eng. 2024, 151, 102306. [Google Scholar] [CrossRef]
  10. Ai, W.; Wei, Y.; Shao, H.; Shou, Y.; Meng, T.; Li, K. Edge-enhanced minimum-margin graph attention network for short text classification. Expert Syst. Appl. 2024, 251, 124069. [Google Scholar] [CrossRef]
  11. Sun, G.; Cheng, Y.; Zhang, Z.; Tong, X.; Chai, T. Text classification with improved word embedding and adaptive segmentation. Expert Syst. Appl. 2024, 238, 121852. [Google Scholar] [CrossRef]
  12. Liu, Y.; Li, M.; Pang, W.; Giunchiglia, F.; Huang, L.; Feng, X.; Guan, R. Boosting Short Text Classification with Multi-Source Information Exploration and Dual-Level Contrastive Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Philadelphia, PA, USA, 25 February–4 March 2025; Volume 39, pp. 24696–24704. [Google Scholar]
  13. Li, X.; Wang, B.; Wang, Y.; Wang, M. Graph-based text classification by contrastive learning with text-level graph augmentation. ACM Trans. Knowl. Discov. Data 2024, 18, 1–21. [Google Scholar] [CrossRef]
  14. Yang, Z. Emmert-Streib F. Optimal performance of Binary Relevance CNN in targeted multi-label text classification. Knowl. Based Syst. 2024, 284, 111286. [Google Scholar] [CrossRef]
  15. Wei, Y.; Wang, Z.; Li, J.; Li, T. The unsupervised short text classification method based on GCN encoder–decoder and local enhancement. Expert Syst. Appl. 2025, 282, 127678. [Google Scholar] [CrossRef]
  16. Zhu, Y.; Wang, Y.; Mu, J.; Li, Y.; Qiang, J.; Yuan, Y.; Wu, X. Short text classification with soft knowledgeable prompt-tuning. Expert Syst. Appl. 2024, 246, 123248. [Google Scholar] [CrossRef]
  17. Tang, J.; Qu, M.; Mei, Q. Pte: Predictive text embedding through large-scale heterogeneous text networks. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1165–1174. [Google Scholar]
  18. Soni, S.; Chouhan, S.S.; Rathore, S.S. TextConvoNet: A convolutional neural network based architecture for text classification. Appl. Intell. 2023, 53, 14249–14268. [Google Scholar] [CrossRef] [PubMed]
  19. Pozzi, A.; Incremona, A.; Tessera, D.; Toti, D. Mitigating exposure bias in large language model distillation: An imitation learning approach. Neural Comput. Appl. 2025, 37, 12013–12029. [Google Scholar] [CrossRef]
  20. Okkalioglu, M. A novel redistribution-based feature selection for text classification. Expert Syst. Appl. 2024, 246, 123119. [Google Scholar] [CrossRef]
  21. Behzadidoost, R.; Mahan, F.; Izadkhah, H. Granular computing-based deep learning for text classification. Inf. Sci. 2024, 652, 119746. [Google Scholar] [CrossRef]
  22. Maragheh, H.K.; Gharehchopogh, F.S.; Majidzadeh, K.; Sangar, A.B. A hybrid model based on convolutional neural network and long short-term memory for multi-label text classification. Neural Process. Lett. 2024, 56, 42. [Google Scholar] [CrossRef]
  23. Cheng, Q.; Shi, W. Hierarchical multi-label text classification of tourism resources using a label-aware dual graph attention network. Inf. Process. Manag. 2025, 62, 103952. [Google Scholar] [CrossRef]
  24. Liu, H.; Huang, X.; Liu, X. Improve label embedding quality through global sensitive GAT for hierarchical text classification. Expert Syst. Appl. 2024, 238, 122267. [Google Scholar] [CrossRef]
  25. Yuan, B.; Chen, Y.; Tan, Z.; Jinyan, W.; Liu, H.; Zhang, Y. Label distribution learning-enhanced dual-knn for text classification. In Proceedings of the 2024 SIAM International Conference on Data Mining (SDM), SIAM, Houston, TX, USA, 18–20 April 2024; pp. 400–408. [Google Scholar]
  26. Liang, Z.; Guo, J.; Qiu, W.; Huang, Z.; Li, S. When graph convolution meets double attention: Online privacy disclosure detection with multi-label text classification. Data Min. Knowl. Discov. 2024, 38, 1171–1192. [Google Scholar] [CrossRef]
  27. Wang, Z.; Zheng, X.; Zhang, J.; Zhang, M. Three-branch BERT-based text classification network for gastroscopy diagnosis text. Int. J. Crowd Sci. 2024, 8, 56–63. [Google Scholar] [CrossRef]
  28. Wen, Z.; Fang, Y. Prompt tuning on graph-augmented low-resource text classification. IEEE Trans. Knowl. Data Eng. 2024, 36, 9080–9095. [Google Scholar] [CrossRef]
  29. Zhang, Y.; Yang, R.; Xu, X.; Li, R.; Xiao, J.; Shen, J.; Han, J. Teleclass: Taxonomy enrichment and llm-enhanced hierarchical text classification with minimal supervision. In Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 2032–2042. [Google Scholar]
  30. Liu, Y.; Huang, L.; Giunchiglia, F.; Feng, X.; Guan, R. Improved graph contrastive learning for short text classification. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 26–27 February 2024; Volume 38, pp. 18716–18724. [Google Scholar]
  31. Xu, J.F.; Wu, Y.C. A news text classification study based on BERT-BiLSTM-CNN model. Softw. Eng. 2023, 26, 11–15. [Google Scholar] [CrossRef]
  32. Dou, Z.; Bai, G.; Han, Z.; Li, W.; Li, Y. PFGL-Net: A Personalized Federated Graph Learning Framework for Privacy-Preserving Disease Prediction. J. Artif. Intell. Res. 2025, 2, 12–23. [Google Scholar] [CrossRef]
  33. Gao, I.J.; Liu, G.; Zhu, B.; Zhou, S.; Zheng, H.; Liao, X. Multi-level attention and contrastive learning for enhanced text classification with an optimized transformer. In Proceedings of the 2025 5th International Conference on Consumer Electronics and Computer Engineering (ICCECE), IEEE, Dongguan, China, 28 February–2 March 2025; pp. 499–503. [Google Scholar]
  34. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Figure 1. The illustration of the proposed method.
Figure 1. The illustration of the proposed method.
Mathematics 13 03648 g001
Figure 2. The parameter analysis on three datasets. (a) Ohsumed (ACC). (b) Ohsumed (F1). (c) TagMyNews (ACC). (d) TagMyNews (F1). (e) Snippets (ACC). (f) Snippets (F1).
Figure 2. The parameter analysis on three datasets. (a) Ohsumed (ACC). (b) Ohsumed (F1). (c) TagMyNews (ACC). (d) TagMyNews (F1). (e) Snippets (ACC). (f) Snippets (F1).
Mathematics 13 03648 g002
Figure 3. Four fusion operation analysis on three datasets.
Figure 3. Four fusion operation analysis on three datasets.
Mathematics 13 03648 g003
Table 1. Results (%) of four metrics on the Ohsumed dataset. The average results with standard deviation over five random runs are reported. Bold indicates the best result. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
Table 1. Results (%) of four metrics on the Ohsumed dataset. The average results with standard deviation over five random runs are reported. Bold indicates the best result. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
MethodAccuracyRecallPrecisionF1
QSIM44.58 ± 0.26 **27.75 ± 0.42 **40.36 ± 0.42 **28.67 ± 0.33 **
EMGAN40.12 ± 3.12 **23.31 ± 3.36 **37.74 ± 3.02 **25.14 ± 2.64 **
GTC36.95 ± 2.38 **20.78 ± 1.17 **35.54 ± 1.85 **23.57 ± 1.82 **
PTE37.58 ± 3.87 **18.74 ± 2.72 **32.00 ± 2.97 **16.40 ± 2.63 **
LSTMNN38.74 ± 2.71 **20.12 ± 1.11 **34.44 ± 2.04 **18.07 ± 2.47 **
NFS39.95 ± 1.22 **21.45 ± 1.03 **35.99 ± 1.77 **20.47 ± 0.99 **
BERT-avg42.68 ± 1.17 **24.82 ± 1.30 **37.26 ± 1.41 **25.89 ± 1.38 **
BERT-cls43.24 ± 0.87 **25.69 ± 0.92 **37.75 ± 1.03 **26.46 ± 0.97 **
Hy-TC40.07 ± 0.77 **23.11 ± 0.65 **35.21 ± 0.73 **26.89 ± 0.46 **
IEG-GAT43.24 ± 0.54 **25.86 ± 0.47 **37.77 ± 0.68 **27.34 ± 0.52 **
IGCL42.92 ± 1.25 **25.64 ± 1.12 **38.10 ± 1.78 **27.69 ± 0.73 **
Ours47.23 ± 0.5729.23 ± 0.7243.54 ± 0.6930.72 ± 0.38
Table 2. Results (%) of four metrics on the TagMyNews dataset. The average results with standard deviation over five random runs are reported. Bold indicates the best result. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
Table 2. Results (%) of four metrics on the TagMyNews dataset. The average results with standard deviation over five random runs are reported. Bold indicates the best result. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
MethodAccuracyRecallPrecisionF1
QSIM56.71 ± 0.31 **47.77 ± 0.39 **54.31 ± 0.53 **50.71 ± 0.25 **
EMGAN50.24 ± 2.74 **45.36 ± 2.57 **50.39 ± 2.68 **46.77 ± 1.90 **
GTC43.75 ± 2.21 **25.41 ± 1.33 **40.12 ± 2.05 **26.11 ± 1.27 **
PTE46.12 ± 2.37 **27.67 ± 2.42 **43.42 ± 2.28 **29.12 ± 2.34 **
LSTMNN46.33 ± 2.22 **28.01 ± 1.04 **44.07 ± 1.90 **30.32 ± 1.72 **
NFS50.12 ± 0.94 **33.74 ± 0.87 **49.04 ± 1.41 **33.74 ± 0.81 **
BERT-avg54.84 ± 0.89 **44.78 ± 1.23 **52.23 ± 0.77 **48.17 ± 1.54 **
BERT-cls52.11 ± 1.65 **43.75 ± 1.58 **50.92 ± 1.66 **46.36 ± 1.86 **
Hy-TC53.00 ± 0.54 **44.22 ± 0.44 **51.06 ± 0.44 **47.99 ± 0.37 **
IEG-GAT56.84 ± 0.70 **48.03 ± 0.68 **53.49 ± 0.93 **51.03 ± 0.60 **
IGCL55.27 ± 0.96 **46.75 ± 0.85 **52.33 ± 1.02 **50.70 ± 0.57 **
Ours60.74 ± 0.4451.47 ± 0.3758.74 ± 0.6955.88 ± 0.62
Table 3. Results (%) of four metrics on the Snippets dataset. Bold indicates the best result. The average results with standard deviation over five random runs are reported. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
Table 3. Results (%) of four metrics on the Snippets dataset. Bold indicates the best result. The average results with standard deviation over five random runs are reported. The symbols ** indicate that the performance is significantly worse than the proposed model (Ours) at p < 0.01 levels based on paired t-tests.
MethodAccuracyRecallPrecisionF1
QSIM75.12 ± 0.54 **76.09 ± 0.67 **75.86 ± 0.60 **75.32 ± 0.57 **
EMGAN69.69 ± 1.74 **68.57 ± 2.02 **69.22 ± 2.14 **68.25 ± 1.87 **
GTC60.12 ± 2.44 **62.12 ± 2.24 **59.96 ± 1.69 **60.57 ± 1.93 **
PTE56.62 ± 2.20 **55.22 ± 2.75 **56.12 ± 2.41 **56.37 ± 2.23 **
LSTMNN58.74 ± 2.36 **58.75 ± 2.44 **54.94 ± 2.35 **58.07 ± 2.23 **
NFS67.39 ± 1.47 **67.21 ± 1.45 **67.35 ± 1.99 **67.20 ± 1.47 **
BERT-avg70.74 ± 0.88 **71.04 ± 0.67 **70.42 ± 1.25 **71.07 ± 0.93 **
BERT-cls69.25 ± 0.85 **70.07 ± 0.83 **69.37 ± 1.34 **70.28 ± 0.76 **
Hy-TC68.55 ± 0.92 **69.97 ± 0.74 **68.82 ± 0.89 **69.89 ± 0.86 **
IEG-GAT73.24 ± 0.64 **74.86 ± 0.89 **75.49 ± 0.84 **75.27 ± 0.93 **
IGCL72.20 ± 1.75 **72.95 ± 0.68 **71.19 ± 1.44 **72.77 ± 1.36 **
Ours79.75 ± 0.5478.35 ± 0.8878.62 ± 0.7678.38 ± 0.75
Table 4. View representation ablation on three datasets in terms of Accuracy and F1 (%). The average results with standard deviation over five random runs are reported. Bold indicates the best result.
Table 4. View representation ablation on three datasets in terms of Accuracy and F1 (%). The average results with standard deviation over five random runs are reported. Bold indicates the best result.
Dataset MetricOhsumedTagMyNewsSnippets
AccuracyF1AccuracyF1AccuracyF1
Z w/o Z s 45.24 ± 0.7429.57 ± 0.8758.57 ± 1.1154.64 ± 1.4477.46 ± 2.4576.56 ± 1.92
Z w/o Z l 46.37 ± 0.8529.75 ± 0.6958.28 ± 0.8255.55 ± 0.7378.08 ± 0.7677.89 ± 0.89
Z w/o Z k 45.66 ± 0.4328.89 ± 0.3558.36 ± 0.3853.92 ± 0.4277.67 ± 0.5976.43 ± 0.43
Ours47.23 ± 0.5730.72 ± 0.3860.74 ± 0.4455.88 ± 0.6279.75 ± 0.5478.38 ± 0.75
Table 5. Loss ablation on three datasets in terms of Accuracy and F1 (%). The average results with standard deviation over five random runs are reported. Bold indicates the best result.
Table 5. Loss ablation on three datasets in terms of Accuracy and F1 (%). The average results with standard deviation over five random runs are reported. Bold indicates the best result.
Dataset MetricOhsumedTagMyNewsSnippets
AccuracyF1AccuracyF1AccuracyF1
L w/o L h c e 34.12 ± 1.1217.02 ± 1.4642.39 ± 1.8735.18 ± 2.0157.57 ± 1.7856.25 ± 1.53
L w/o L n s l 44.96 ± 0.7327.49 ± 0.6657.82 ± 0.8353.74 ± 0.9776.55 ± 0.6575.29 ± 0.72
L w/o L s r l 43.66 ± 0.8427.07 ± 0.7656.57 ± 0.5952.62 ± 0.5575.33 ± 1.2275.03 ± 1.36
Ours47.23 ± 0.5730.72 ± 0.3860.74 ± 0.4455.88 ± 0.6279.75 ± 0.5478.38 ± 0.75
Table 6. Model analysis in terms of parameter, inference time and FLOPS on the Ohsumed dataset.
Table 6. Model analysis in terms of parameter, inference time and FLOPS on the Ohsumed dataset.
MetricQSIMLSTMNNBERT-avgOurs
Parameter81.65 M564 K110 M779 K
Inference time0.85 s0.15 s1.19 s0.18 s
FLOPS15.393.4111.45.39
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, X.; Yao, M. Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy. Mathematics 2025, 13, 3648. https://doi.org/10.3390/math13223648

AMA Style

Sun X, Yao M. Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy. Mathematics. 2025; 13(22):3648. https://doi.org/10.3390/math13223648

Chicago/Turabian Style

Sun, Xinyu, and Meifang Yao. 2025. "Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy" Mathematics 13, no. 22: 3648. https://doi.org/10.3390/math13223648

APA Style

Sun, X., & Yao, M. (2025). Multi-Granularity Invariant Structure Learning for Text Classification in Entrepreneurship Policy. Mathematics, 13(22), 3648. https://doi.org/10.3390/math13223648

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop