Next Article in Journal
Suppressing Nonlinear Resonant Vibrations via NINDF Control in Beam Structures
Previous Article in Journal
On Estimates of Functions in Norms of Weighted Spaces in the Neighborhoods of Singularity Points
Previous Article in Special Issue
On the Synergy of Optimizers and Activation Functions: A CNN Benchmarking Study
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Trustworthy Toxic Text Detection Method with Entropy-Oriented Invariant Representation Learning for Portuguese Community

1
School of European Language and Culture Studies, Dalian University of Foreign Languages, Dalian 116044, China
2
University International College, Macau University of Science and Technology, Macau 999078, China
3
Graduate School of Education, Dalian University of Technology, Dalian 116024, China
*
Author to whom correspondence should be addressed.
Mathematics 2025, 13(13), 2136; https://doi.org/10.3390/math13132136
Submission received: 20 May 2025 / Revised: 28 June 2025 / Accepted: 28 June 2025 / Published: 30 June 2025
(This article belongs to the Special Issue Artificial Intelligence and Data Science, 2nd Edition)

Abstract

With the rapid development of digital technologies, data-driven methods have demonstrated commendable performance in the toxic text detection task. However, several challenges remain unresolved, including the inability to fully capture the nuanced semantic information embedded in text languages, the lack of robust mechanisms to handle the inherent uncertainty of text languages, and the utilization of static fusion strategies for multi-view information. To address these issues, this paper proposes a comprehensive and dynamic toxic text detection method. Specifically, we design a multi-view feature augmentation module by combining bidirectional long short-term memory and BERT as a dual-stream framework. This module captures a more holistic representation of semantic information by learning both local and global features of texts. Next, we introduce an entropy-oriented invariant learning module by minimizing the conditional entropy between view-specific representations to align consistent information, thereby enhancing the representation generalization. Meanwhile, we devise a trustworthy text recognition module by defining the Dirichlet function to model uncertainty estimation of text prediction. And then, we perform the evidence-based information fusion strategy to dynamically aggregate decision information between views with the help of the Dirichlet distribution. Through these components, the proposed method aims to overcome the limitations of traditional methods and provide a more accurate and reliable solution for toxic language detection. Finally, extensive experiments on the two real-world datasets show the effectiveness and superiority of the proposed method in comparison with seven methods.

1. Introduction

In the era of the explosive growth of user-generated online content, toxic text detection has emerged as a critical challenge at the intersection of cybersecurity and social governance [1,2,3,4,5]. Toxic text, spanning hate speech, harassment, misinformation, and other harmful content, poses significant risks to mental health, social stability, and platform integrity [6,7,8]. Traditional manual moderation struggles to cope with the scale and diversity of modern digital interactions, necessitating automated solutions rooted in data-driven intelligence. By leveraging advances in natural language processing, machine learning, and big data analytics, researchers are now developing data-driven hybrid frameworks that combine analytical reasoning, computational modeling, and adaptive learning to decode implicit toxicity patterns [9,10,11].
Existing approaches to toxic text detection can be broadly categorized as rule-driven or data-driven. Rule-driven approaches depend on predefined lexicons, syntactic rules, or pattern-matching algorithms to spot toxic content, valuing interpretability through tools like keyword blacklists or regular expressions [12,13]. However, they struggle to adapt to changing language subtleties, slang, or culturally specific toxicity. Data-driven approaches utilize deep learning to automatically identify toxicity via latent semantic patterns in text embeddings and can be further divided into several subgroups [13,14,15]. Convolutional Neural Network (CNN)-based methods employ convolutional operations to extract salient features from text data, effectively capturing local features and spatial hierarchies in the input [16,17]. Transformer-based approaches utilize self-attention mechanisms to model intricate relationships between words, allowing the model to weigh the importance of different words in relation to each other and capture long-range dependencies [18,19]. Long Short-Term Memory (LSTM)-based methods process sequential data to capture contextual information, making them adept at understanding the order and flow of text sequences [20,21]. Bert-based approaches benefit from pretraining on large-scale text corpora, enabling them to capture rich linguistic features and contextual information through masked language modeling and next sentence prediction tasks. Ensemble-based learning methods combine the strengths of multiple models to improve performance, reducing the risk of overfitting and enhancing generalization [22,23]. In recent years, with the emergence of large-scale language models (LLMs), LLM-based methods have been increasingly applied to toxicity detection tasks through fine-tuning or prompt-based adaptations [24,25,26]. While they offer powerful general language understanding capabilities, their application to specialized tasks like toxic text detection still requires careful adaptation. Unlike traditional data-driven models that are specifically fine-tuned for toxicity detection, LLMs can sometimes provide a broader contextual awareness but may lack the specialized focus needed for high-precision detection in specific domains.
Despite the encouraging performance achieved by data-driven methods, they still face several limitations: First, they may fail to capture the rich semantic information in text, especially when dealing with the complex grammatical structures unique to Portuguese. For example, Portuguese has an intricate system of verb conjugations and noun declensions that vary based on tense, mood, aspect, gender, and number. The prevalence of polysemy, where words can have multiple meanings depending on the context, also poses challenges. Second, data-driven approaches frequently struggle to address the uncertainty inherent in model decision making. The decision-making process of these models is often based on probabilistic outputs and confidence scores, which can be unreliable due to factors such as limited training data, class imbalance, and the presence of noise in the input text. Third, many data-driven methods rely on static fusion strategies to aggregate multi-view information for model decision making. These strategies typically combine features from different modalities or sources using fixed weights or predefined rules. However, this approach cannot adaptively adjust to the characteristics of different samples. In reality, the importance of various features or views may vary significantly across samples. For instance, in some cases, textual content might be more critical, while in others, structural or contextual information could be more relevant. Static fusion strategies fail to account for these variations, which can lead to suboptimal model performance. They do not provide the flexibility needed to optimally integrate multi-view information for each specific sample, thus limiting the overall effectiveness of the model in complex and dynamic text analysis tasks.
To address these challenges, this paper proposes a comprehensive and dynamic toxic text detection method for recognizing sustainable development insights, which integrates multi-view feature augmentation, entropy-oriented invariant learning, trustworthy comment recognition, and evidence-based information fusion. Specifically, the multi-view feature augmentation defines a dual-stream encoding architecture with the help of BiLSTM and BERT, to capture the local and global information of text. Entropy-oriented invariant learning minimizes the conditional entropy between representations extracted by different feature encoders to align complementary information, which improves the generalization of representations. Meanwhile, trustworthy comment recognition conducts the Dirichlet function to measure the uncertainty estimation of model predictions, which ensures the reliability of the model. Finally, evidence-based information fusion dynamically aggregates information from multiple views based on the uncertainty of each view, ensuring that the model can adaptively leverage the most informative features for each sample. Through these components, our method aims to overcome the limitations of traditional approaches and provide a more accurate and reliable solution for toxic language detection in sustainable development insights.
The key contributions of this paper include as follows:
  • We introduce a dual-stream framework combining BiLSTM and BERT with entropy-oriented invariant learning, effectively capturing comprehensive semantic features and enhancing generalization in toxic text detection.
  • We also propose a novel trustworthy comment recognition strategy using the Dirichlet function for uncertainty estimation, coupled with evidence-based dynamic information fusion, which significantly improves the reliability and accuracy of detection results.
  • Extensive experiments conducted on real-world datasets verify that the proposed method provides a more effective and reliable solution for toxic text detection in sustainable development insights.
The subsequent sections of this paper are structured as follows: Section 2 provides an in-depth exposition of the proposed method. Section 3 presents a comprehensive evaluation of experiment results. Section 4 concludes the study by summarizing key findings and outlining future research directions.

2. The Proposed Method

Mathematically, let D = { x 1 , , x N } denote a sustainable development insight text dataset containing N textual posts. The goal of the toxic text detection task is to learn a prediction function f : D { 0 , 1 } . To this end, a trustworthy toxic text detection method is proposed for perceiving sustainable development insights, which contains multi-view feature augmentation, entropy-oriented invariant learning, trustworthy comment recognition, and evidence-based information fusion, as shown in Figure 1.

2.1. Multi-View Feature Augmentation

To address the complexity and diversity of toxic text patterns in sustainable development insights, we propose a dual-stream parallel feature extraction framework combining Bidirectional Long Short-Term Memory (BiLSTM) and BERT. This architecture leverages complementary advantages of both models to enhance feature representation.
Global semantic modeling: The BERT model, based on the Transformer’s self-attention mechanism, achieves three-dimensional semantic modeling of global text. Its core strength lies in synchronously capturing semantic correlations between any positions in a sequence through parallelized computation, breaking the local window limitations of traditional models and constructing cross-sentence semantic topological networks at the word-vector level. This unique capability enables precise parsing of deep semantic structures (e.g., logical reasoning, coreference resolution), particularly through context-sensitive word representations acquired via large-scale pretraining, which accurately distinguishes semantic differences of identical words in contexts.
Given an input text sequence x i = { x i 1 , x i 2 , , x i L } , where L denotes the length of the text. BERT generates contextualized embeddings through:
E BERT = LayerNorm ( E token + E segment + E position )
where E token denotes token embeddings, E segment denotes the segment embeddings (to distinguish between sentences), and E position denotes the positional embeddings. Then, BERT uses multiple Transformer encoder layers, each consisting of a multi-head self-attention mechanism M S A ( · ) and a feed-forward neural network F F N ( · ) .
Z ^ = M S A ( Norm ( E BERT ) ) m 1 = Z ^ + E BERT m s = F F N ( Norm ( m ^ s 1 ) ) + m ^ s 1 m ^ s = M S A ( Norm ( m s 1 ) ) + m s 1
where s denotes the number of encoding layers. The final representations of BERT are denoted as Z i , i = 1 , 2 , , N .
Local contextual dynamics: The BiLSTM model specializes in extracting local dynamic features of text streams through its gated recurrent architecture. By modeling bidirectional temporal sequences, it traces combinatorial patterns of evaluative elements (e.g., “screen-clarity-stunning”) in variable word orders. Even when evaluation subjects and sentiment words are separated by over 20 characters (e.g., “This phone, though I waited half a month for delivery, its AMOLED display effect truly...”), it reliably establishes precise modifier relationships. For fragmented expressions in sustainable development insights texts (e.g., frequent pronoun jumps like “this” or “it”), the model automatically repairs semantic discontinuities through hidden state transmission, effectively handling scenarios with omitted subjects like “Just received it with scratches, so disappointing.” Crucially, its incremental information processing mechanism inherently filters noise such as typos and emoji insertions.
Given an input text sequence x i = { x i 1 , x i 2 , , x i L } , where L denotes the length of the text, BiLSTM processes sequences bidirectionally through two LSTM layers to generate the corresponding representations:
h i t = LSTM ( x i t , h i t 1 ) h i t = LSTM ( x i t , h i t + 1 ) h i t = ω 1 h i t + ω 2 h i t
where h i t represents the forward hidden state, and h i t represents the backward hidden state. These two hidden states are combined to form the final bidirectional hidden state h i t . Here, ω 1 and ω 2 are weight coefficients. Through this method, the model can consider both contextual information simultaneously and update the network parameters during training through forward and backward propagation.

2.2. Entropy-Oriented Invariant Learning

The core motivation of view-invariant representation learning stems from addressing semantic inconsistency across multi-view features and overcoming generalization bottlenecks in models. Since BiLSTM and BERT extract features from distinct perspectives, i.e., local sequential dependencies and global contextual interactions, respectively, they inherently differ in syntactic sensitivity and semantic abstraction levels. This discrepancy may lead to shifts in the representation space of the same text across different models. Such view-specific biases can weaken the model’s ability to capture essential semantics. View-invariant representation learning mitigates this by constraining the distribution of representations across views, compelling the model to discard view-dependent interference and instead focus on high-order semantic information shared across views. Specifically, by minimizing conditional entropy between representations, the model implicitly builds semantic mapping bridges between views, aligning representations from different perspectives into unified semantic concepts in latent space. This approach not only resolves information conflicts during multi-view fusion but also provides downstream tasks with purer, more universal semantic encodings.
Specifically, the conditional entropy between representations z and representations h is defined as:
H ( h | z ) = E P h , z [ log P ( h | z ) ]
where H ( h | z ) quantifies the uncertainty of the representations h given the representations z. It is computed as the negative expectation (over the joint distribution P h , z ) of the conditional probability P ( h | z ) . Minimizing H ( h | z ) implies reducing the uncertainty of h when z is known.
Directly computing P ( h | z ) is intractable, so a variational distribution Q ( h | z ) is introduced to approximate P ( h | z ) . By maximizing the expectation of log Q ( h | z ) over P h , z , we maximize a lower bound of the original objective E [ log P ( h | z ) ] . This leverages the Evidence Lower Bound principle from variational inference, as follows:
max E P h , z [ log P ( h | z ) ] E P h , z [ log Q ( h | z ) ]
Then, we assume Q as a Gaussian distribution N ( h | G ˜ ( z ) , σ I ) ,
Q ( h | z ) = N ( h | G ˜ ( z ) , σ I ) ,
where G ˜ ( z ) is the mean (parameterized by a cross-view mapping function) and σ I is the covariance matrix. Substituting the Gaussian density into log Q ( h | z ) , we obtain the logarithmic likelihood term involving the squared error ( h G ˜ ( z ) ) 2 .
max E P h , z [ log Q ( h | z ) ] = E P h , z log 1 σ I 2 π e ( h G ˜ ( z ) ) 2 2 ( σ I )
max E P h , z ( h G ˜ ( z ) ) 2 2 σ I + log 1 2 π σ I .
Expanding the Gaussian log-likelihood yields two terms: A squared error term ( h G ˜ ( z ) ) 2 2 σ I , which penalizes deviations of G ˜ ( z ) from h. A constant term log 1 2 π σ I , independent of the optimization variables. Since constants do not affect the optimization, they can be ignored; then we have,
max E P h , z ( h G ˜ ( z ) ) 2 2 .
Since the true data distribution is unknown, we utilize Monte Carlo estimation to approximate the expectation using finite samples { ( h i , z i ) } i = 1 N , resulting in the sample mean squared error:
min 1 N i = 1 N h i G ˜ ( z i ) 2 2 .
Further, in cross-view learning, to enhance bidirectional consistency, the loss function is extended into a symmetric form:
min 1 N i = 1 N h i G ˜ ( z i ) 2 2 + i = 1 N z i G ( h i ) 2 2

2.3. Trustworthy Comment Recognition

Current toxic language detection models typically employ the softmax activation function for classification predictions. However, the softmax function merely converts the model’s logits into a probability distribution, lacking an estimation of decision uncertainty. This characteristic can cause the model to appear overly confident when dealing with complex or ambiguous inputs, even if its predictions may not be reliable. For instance, even when there is insufficient evidence to support the classification of certain samples, the probability values output by softmax can still be very high. This overconfidence may lead to erroneous decisions in practical applications, such as content moderation. To this end, a trustworthy comment recognition strategy is designed by introducing the uncertainty estimation into pattern decisions with the help of the Dirichlet function.
Specifically, the Dirichlet function is defined via C parameters θ = [ θ 1 , , θ C ] as follows,
Dirichlet ( p θ ) = 1 β ( θ ) i = 1 C p i θ i 1 if p Λ C 0 otherwise
Λ C = p i = 1 C p i = 1 , 0 p 1 , , p C 1
where β ( θ ) and Λ C represent the C-dimensional multivariate β function and simplex, respectively. To achieve the Dirichlet function, the Dirichlet network is designed using the softplus activation function instead of the softmax activation function to generate evidence representations v = { v 1 , , v C } . Subsequently, the parameters of the Dirichlet distribution are derived based on evidence representations as follows:
θ k = v k + 1
After obtaining the parameters of the Dirichlet distribution, the uncertainty estimation u of pattern decisions and confidence level e k for each category are modeled as follows:
u = K k = 1 C θ k , e k = v k k = 1 C θ k
where u and e = { e 1 , , e C } are all non-negative and the sum is 1:
k = 1 C e k + u = 1
Then, we infer the variational lower bound of the Dirichlet function as the loss function to guide pattern mining of the toxic language detection. Given observations { x i , y i } i = 1 n where y i denotes the corresponding labels of the text x i , the generative process is defined as follows:
p P ( p x ) = Dirichlet ( p θ ) , y P ( y p )
According to the previous work [17], the marginal likelihood of P ( p x ) is rewritten as:
log p ( y x ) = D K L [ Q ( p x ) P ( p x , y ) ] + L d
where Q ( p x ) is instantiated as the neural network, and L ( x , y ) is formulated as:
L d = E Q ( p x ) [ log P ( y p ) ] D K L [ Q ( p x ) P ( p x ) ] .
where D K L ( · ) represents the Kullback–Leibler divergence, a measure of how one probability distribution diverges from a second, expected probability distribution. This divergence is inherently non-negative, reflecting the information loss when approximating one distribution with another, and ensures that L d serves as a robust lower bound for the marginal likelihood log P ( y x ) . Thus, we optimize L d to achieve the marginal likelihood maximization. More specifically, we encapsulate the integral of the cross-entropy loss into the Dirichlet distribution D i r i c h l e t ( p θ ) as the first term of L d :
E Q ( p x ) [ log P ( y p ) ] = k = 1 C y k ( D ( θ k ) D ( c = 1 C θ c ) )
where D ( · ) is the digamma function. y k denotes the k element of one-hot label y. For the second term of L d , a prior constraint is introduced to obtain a good Dirichlet distribution that should be concentrated on the vertices corresponding to the simplex, which implies that the parameters of the Dirichlet distribution should be as close to 0 as possible, except for the parameter corresponding to the correct label:
D K L [ Q ( p x ) P ( p x ) ] = D K L ( D i r i c h l e t ( p θ ˜ ) D i r i c h l e t ( p [ 1 , , 1 ] ) )
where θ ˜ = y + ( 1 y ) θ . [ 1 , , 1 ] denotes a vector of ones.

2.4. Evidence-Based Information Fusion

When obtaining decision-making information from two views, we often use view-based weighting methods to integrate their complementary information, which helps improve decision-making accuracy. Nevertheless, conventional weighting methods are commonly static. Once set, the weights remain fixed throughout the decision-making process without adapting to individual sample characteristics. In other words, these fusion methods overlook that different samples may have varying significance across views, which can cause the decision-making performance to deteriorate. To this end, we model the uncertainty of the Dirichlet distribution in each sample of each view to integrate multi-view decision-making information based on Evidence Theory.
Specifically, after obtaining the uncertainty estimations u l and u b of pattern decisions and confidence levels e k l and e k b for each category, the evidence-based fusion rule is defined as follows:
e k f = 1 1 i j e i l e j v ( e k l e k v + e k l u v + e k v u l ) u f = 1 1 i j e i l e j v u l u v
where u f and e k f denote the fused uncertainty estimation and the confidence level, respectively. The corresponding fusion Dirichlet distribution D i r i c h l e t ( p f θ f ) is obtained through:
θ k f = e k f × C / u f + 1 .
Then, a multi-view evidence-based fusion loss is designed to achieve an optimized fusion Dirichlet distribution, as follows
L f = k = 1 C y k ( D ( θ k f ) D ( c = 1 C θ c f ) ) + D K L ( D i r i c h l e t ( p f θ f ˜ ) D i r i c h l e t ( p f [ 1 , , 1 ] ) )
The evidence-based information fusion component, in conjunction with a Dirichlet parameterization-based classification network and evidence fusion theory, is capable of dynamically identifying views that pose risks to decision making and leveraging informative views in the final decision. This enables the model to make accurate classification decisions even when faced with diverse sustainable development insights and complex sample conditions.

2.5. The Overall Loss Function

We define the following loss L to train the proposed method to obtaining the final toxic language detection results:
L = L h + α L d + β L f
where L h = L h l + L h b denotes the entropy-oriented invariant learning loss that is used to guide the multi-view representation extraction. L d = L d l + L d b denotes the trustworthy comment recognition loss that is used to guide the toxic language detection. L f = L f l + L f b denotes the evidence-based information fusion loss that is used to guide the sample-level decision aggregation. α and β denote trade-off parameters.
In summary, the proposed method for toxic language detection in sustainable development insights offers several significant advantages. By integrating multi-view feature augmentation, entropy-oriented invariant learning, trustworthy comment recognition, and evidence-based information fusion, it comprehensively addresses the limitations of traditional approaches. The dual-stream feature extraction framework combining BiLSTM and BERT ensures rich and robust feature representation from both local and global perspectives. The entropy-oriented invariant learning strategy enhances the model’s generalization ability by aligning semantic representations across different views, effectively handling semantic inconsistencies. The introduction of uncertainty estimation through the Dirichlet function in the trustworthy comment recognition component provides a more reliable decision-making process, quantifying the confidence of predictions. Furthermore, the evidence-based information fusion mechanism dynamically aggregates multi-view decisions, adapting to individual sample characteristics and reducing overconfidence in predictions. Collectively, these components make the proposed method more accurate, reliable, and well-suited for detecting toxic language in the complex and diverse contexts of sustainable development insights, advancing the state of the art in this critical area of research.

3. Experimental Evaluation

3.1. Setup

Datasets: Following previous works [27,28,29,30], two common datasets, namely SustODS-PT and ToxiLuso-EC, are employed to evaluate the performance of the proposed method in the toxic language detection task. SustODS-PT comprises 10,000 sustainable development insight texts discussing sustainable development topics within the Portuguese-speaking community. These comments are collected from diverse digital systems and cover a wide range of viewpoints and expressions related to sustainability initiatives. Similarly, ToxiLuso-EC contains 6497 sustainable development insight texts focusing on sustainable development content, gathered from various online forums and social networks where Portuguese is predominantly used. Both datasets are formatted as binary classification datasets. In these datasets, label 1 is assigned to text content that conveys negative emotions, such as anger or frustration, negative evaluations like criticism or disapproval, or more intense emotions such as threats or hate speech. Conversely, label 0 is assigned to text content that is neutral, expressing factual information without strong emotional language, ordinary statements that do not lean towards any extreme sentiment, or somewhat positive text content that reflects support, approval, or optimism regarding sustainable development initiatives. This binary classification setup allows for a clear distinction between toxic and non-toxic language, facilitating the evaluation of the proposed method’s effectiveness in identifying harmful content that could undermine constructive discourse on sustainability.
Metrics: Following previous works [22,23,31], Precision, Recall, F1, and Accuracy are utilized as metrics to evaluate the performance of the proposed method on the two above datasets. Precision measures the proportion of correctly predicted toxic language instances among all instances predicted as toxic, reflecting the accuracy of positive predictions. Recall quantifies the percentage of actual toxic language instances correctly identified by the model, indicating its ability to detect true positives. The F1 score balances Precision and Recall through their harmonic mean, offering a comprehensive metric of the model’s performance in handling imbalanced datasets. Accuracy represents the ratio of correctly predicted instances to the total instances predicted, providing an overall effectiveness measure of the model across both toxic and non-toxic language detection.
Implementation Details. Our method is deployed using the open-source framework PyTorch 2.0.1 and is conducted on an Ubuntu 20.04 server equipped with an NVIDIA Tesla V100 GPU. In the experiments, the AdamW optimizer with the learning rate 5 × 10−4 is selected for optimizing the overall network. In the optimization, the epoch number and the batch size are set to 200 and 16, respectively. Following previous works [32,33,34,35], the training set, the validation set, and the test set account for 60%, 20%, and 20% of the datasets, respectively. When the validation set’s loss fails to improve for 5 consecutive epochs, training is stopped early. To ensure the reproducibility of the results, a fixed random seed of 42 is used throughout all experiments. The code and datasets are available at: https://github.com/liumeng1541/toldbr-bert-text-classification-pt-br (accessed on 6 June 2025).

3.2. Comparison with Baselines

Baselines. The 11 baselines in the toxic text detection are used as comparison methods, which can be partitioned into four groups, concluding convolutional neural network-based methods, i.e., GCNN-TTD [32] and RCNN-TTD [27], Transformer-based methods, i.e., TTD [13], LSTM-based methods, i.e., ToXCL [7] and TTD [14], and Bert-based methods, TLD [1], FFDC [11], MMTLD [9], and CA-MTL [15], and ensemble-based learning methods, i.e., RTLD [8] and ALD [16]. Three strategies are used for fair comparison in the experiments: (1) The experiment environments for all comparison methods are the same. (2) The grid search trick is used on the trade-off parameters suggested by the corresponding comparison methods to obtain the best performance. (3) The experiment results are the average results of five runs for all comparison methods.
Results. The experiment results are presented in Table 1 and Figure 2. It can be observed that our proposed method demonstrates superior performance across both datasets in all four metrics. On the SustODS-PT dataset, our method achieves the highest Precision of 0.7620, Recall of 0.7584, F1 score of 0.7632, and Accuracy of 0.7548. Similarly, on the ToxiLuso-EC dataset, our method attains the best Precision of 0.7974, Recall of 0.7816, F1 score of 0.7852, and Accuracy of 0.7840. Compared to the baseline methods, our approach shows significant improvements. For instance, on the SustODS-PT dataset, our method outperforms the second-best results by 0.0348 in Precision, 0.0282 in Recall, 0.0316 in F1 score, and 0.0184 in Accuracy, respectively.
Analysis. The effectiveness of the proposed method can be further explained by its unique components. The multi-view feature augmentation strategy leverages the complementary strengths of BERT and BiLSTM to enhance feature representation. BERT’s global semantic modeling captures deep semantic structures through self-attention, while local contextual dynamics of BiLSTM extract sequential patterns and handle fragmented expressions. The entropy-oriented invariant learning addresses semantic inconsistencies across views by minimizing conditional entropy, ensuring that representations from different perspectives are aligned in latent space. The trustworthy comment recognition strategy introduces uncertainty estimation through the Dirichlet function, providing a more reliable decision-making process by quantifying decision uncertainty. Finally, the evidence-based information fusion dynamically integrates multi-view decision-making information, adapting to individual sample characteristics and reducing the risk of overconfidence in predictions. These components collectively contribute to the superior performance of our method in toxic text detection for sustainable development insights.

3.3. Parameter Analysis

This subsection explores the impact of trade-off parameters α and β on the detection performance by conducting sensitivity analysis experiments. Specifically, the values of parameters α and β are restricted to the set { 1 , 0.1 , 0.01 , 0.001 , 0.0001 } . By fixing one parameter at a specific value and changing the values of another parameter, the changing trends of the classification performance are recorded in Figure 3. The results show that when α and β are set to 1, the model achieves the best overall performance in terms of Precision, Recall, F1, and Accuracy. When the values of α and β are too small (e.g., 0.001 or 0.0001), the model tends to underfit, resulting in lower detection accuracy. Conversely, when the values are too large, the model may overfit to the training data, leading to poor generalization performance on the test set. Therefore, in the experiments, α and β are set to 1, respectively.
The experimental results presented in Figure 4 show the influence of different learning rates (r) on the performance of the proposed method on the SustODS-PT dataset. The learning rate is a crucial hyperparameter that affects how quickly and effectively the model converges during training. A suitable learning rate can help the model achieve better performance in terms of Precision, Recall, F1, and Accuracy. When the learning rate is set to 0.0001, the model demonstrates relatively balanced performance across all metrics. However, as the learning rate increases to 0.001, there is a noticeable improvement in Precision and Recall, indicating that the model can better distinguish between toxic and non-toxic language at this rate. Further increasing the learning rate to 0.01 leads to a slight decrease in performance, suggesting that the model may start to overshoot optimal parameters. When the learning rate is set to 0.1, the performance drops significantly across all metrics, which may be due to the model becoming unstable during training and failing to converge properly. Therefore, the learning rate should be carefully tuned to balance the speed of convergence and the quality of the final model. In practice, the learning rate is set as [ 0.0001 , 0.001 ] to achieve optimal results on the two datasets.

3.4. Ablation Study

The ablation study in Table 2 evaluates the contributions of different components of the proposed method on the SustODS-PT dataset. The study systematically investigates the impact of individual and combined loss functions on the model’s performance.
In the single-view setting, using only L h results in lower performance across all metrics, indicating that entropy-oriented invariant learning alone is insufficient for capturing the complexities of toxic language detection. When only L d is used, there is a significant improvement in Precision and Recall compared to L h , highlighting the effectiveness of the trustworthy comment recognition component in isolation. However, the performance is still suboptimal, suggesting that combining multi-view perspectives is essential for better representation learning.
The dual-view experiments demonstrate that combining L h and L d leads to a notable performance boost compared to single-view approaches. This indicates that integrating both entropy-oriented invariant learning and trustworthy comment recognition leverages the complementary strengths of global semantic modeling and local contextual dynamics, enhancing the model’s ability to distinguish toxic language patterns. Further improvements are observed when evidence-based information fusion ( L f ) is added to either L h or L d , with the combination of L d + L f yielding higher Precision and Recall. This underscores the value of dynamically aggregating multi-view decision-making information based on uncertainty estimation, which helps in making more reliable classification decisions.
The complete integration of all three losses ( L h + L d + L f ) achieves the highest performance across all metrics, with Precision, Recall, F1-score, and Accuracy reaching 0.7620, 0.7584, 0.7632, and 0.7548, respectively. This comprehensive approach not only captures rich semantic information from multiple perspectives but also effectively fuses decisions through evidence theory, leading to the most robust and accurate toxic language detection. The results confirm that each component plays a crucial role in the overall effectiveness of the proposed method.

3.5. Statistical Analysis

To assess the statistical significance of the performance differences between our method and the baselines, we conducted the Nemenyi test [36] on both datasets. The results are visualized in Figure 5. The Nemenyi test is a post hoc test used after the Friedman test to determine if the performance differences between algorithms are statistically significant. In the figure, each line represents the average rank of a method across different metrics, along with its confidence interval. If the confidence intervals of two methods do not overlap, their performance difference is considered statistically significant. On the SustODS-PT dataset, our method achieved the highest average rank, and its confidence interval does not overlap with those of most baseline methods except for ALD. Similarly, on the ToxiLuso-EC dataset, our method also obtained the highest average rank, and its confidence interval does not overlap with those of all baseline methods except for RTLD and ALD. This indicates that our method significantly outperforms the majority of the baseline methods. The results of the Nemenyi test further confirm the superiority of our method in toxic language detection on both datasets.

4. Conclusions

In this paper, we have proposed a novel and comprehensive method for detecting text language in the Portuguese community. Our approach addresses the limitations of traditional methods by integrating multi-view feature augmentation, entropy-oriented invariant learning, trustworthy comment recognition, and evidence-based information fusion. Through extensive experiments on real-world datasets, we have demonstrated that our method achieves superior performance compared to existing state-of-the-art techniques. The results highlight the effectiveness of our dual-stream framework combining BiLSTM and BERT, which captures comprehensive semantic features and enhances generalization. Future research will concentrate on refining our model through integration with large language models (LLMs) via prompt tuning. We will investigate designing effective prompts to enhance LLMs’ comprehension of the contextual and nuanced aspects of Portuguese text. Merging our dual-stream framework with LLMs aims to improve the model’s handling of complex linguistic patterns and its adaptability to diverse cultural and contextual environments. This integration, utilizing LLMs’ extensive knowledge and language capabilities, promises more precise and reliable detection outcomes. Furthermore, we will optimize the prompt tuning process to align with our model’s objectives and constraints, experimenting with various prompt formats, lengths, and styles. Our goal is to create a robust and scalable solution applicable to text language detection tasks in Portuguese and multilingual settings. By combining our method’s strengths with LLMs’ advancements, we aim to significantly progress in toxic language detection and support inclusive digital discourse.

Author Contributions

Methodology, J.Z.; Writing—original draft, W.F.; Writing—review & editing, H.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Leite, J.A.; Silva, D.; Bontcheva, K.; Scarton, C. Toxic Language Detection in Social Media for Brazilian Portuguese: New Dataset and Multilingual Analysis. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing, Online, 4–7 December 2020; pp. 914–924. [Google Scholar]
  2. Sap, M.; Swayamdipta, S.; Vianna, L.; Zhou, X.; Choi, Y.; Smith, N.A. Annotators with Attitudes: How Annotator Beliefs And Identities Bias Toxic Language Detection. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 5884–5906. [Google Scholar]
  3. Mahmud, T.; Ptaszynski, M.; Eronen, J.; Masui, F. Cyberbullying detection for low-resource languages and dialects: Review of the state of the art. Inf. Process. Manag. 2023, 60, 103454. [Google Scholar] [CrossRef]
  4. Gao, J.; Liu, M.; Li, P.; Zhang, J.; Chen, Z. Deep Multiview Adaptive Clustering with Semantic Invariance. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 12965–12978. [Google Scholar] [CrossRef] [PubMed]
  5. Gao, J.; Liu, M.; Li, P.; Laghari, A.A.; Javed, A.R.; Victor, N.; Gadekallu, T.R. Deep Incomplete Multiview Clustering via Information Bottleneck for Pattern Mining of Data in Extreme-Environment IoT. IEEE Internet Things J. 2023, 11, 26700–26712. [Google Scholar] [CrossRef]
  6. Bensalem, I.; Rosso, P.; Zitouni, H. Toxic language detection: A systematic review of Arabic datasets. Expert Syst. 2024, 41, e13551. [Google Scholar] [CrossRef]
  7. Hoang, N.; Long, D.; Do, D.A.; Vu, D.A.; Tuan, L.A. ToXCL: A Unified Framework for Toxic Speech Detection and Explanation. In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), Mexico City, Mexico, 16–21 June 2024; pp. 6460–6472. [Google Scholar]
  8. Sharif, O.; Hoque, M.M. Tackling cyber-aggression: Identification and fine-grained categorization of aggressive texts on social media using weighted ensemble of transformers. Neurocomputing 2022, 490, 462–481. [Google Scholar] [CrossRef]
  9. Song, R.; Giunchiglia, F.; Li, Y.; Shi, L.; Xu, H. Measuring and mitigating language model biases in abusive language detection. Inf. Process. Manag. 2023, 60, 103277. [Google Scholar] [CrossRef]
  10. Bespalov, D.; Bhabesh, S.; Xiang, Y.; Zhou, L.; Qi, Y. Towards Building a Robust Toxicity Predictor. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 5: Industry Track), Toronto, ON, Canada, 9–14 July 2023; pp. 581–598. [Google Scholar]
  11. Lu, J.; Xu, B.; Zhang, X.; Min, C.; Yang, L.; Lin, H. Facilitating Fine-Grained Detection of Chinese Toxic Language: Hierarchical Taxonomy, Resources, and Benchmarks. In Proceedings of the 61st Annual Meeting Of The Association For Computational Linguistics, Toronto, ON, Canada, 9–14 July 2023. [Google Scholar]
  12. Trajano, D.; Bordini, R.H.; Vieira, R. OLID-BR: Offensive language identification dataset for Brazilian Portuguese. Lang. Resour. Eval. 2024, 58, 1263–1289. [Google Scholar] [CrossRef]
  13. Damás, G.; Torres Anchiêta, R.; Santos Moura, R.; Ponte Machado, V. A Transformer-Based Tabular Approach to Detect Toxic Comments. In Proceedings of the Brazilian Conference on Intelligent Systems, Belém, Brazil, 17–21 November 2024; pp. 18–30. [Google Scholar]
  14. Maity, A.; More, R.; Patil, A.; Oza, J.; Kambli, G. Toxic comment detection using bidirectional sequence classifiers. In Proceedings of the 2024 2nd International Conference on Intelligent Data Communication Technologies and Internet of Things (IDCIoT), Bengaluru, India, 4–6 January 2024; pp. 709–716. [Google Scholar]
  15. Nelatoori, K.B.; Kommanti, H.B. Toxic comment classification and rationale extraction in code-mixed text leveraging co-attentive multi-task learning. Lang. Resour. Eval. 2025, 59, 161–190. [Google Scholar] [CrossRef]
  16. Khan, A.; Ahmed, A.; Jan, S.; Bilal, M.; Zuhairi, M.F. Abusive language detection in urdu text: Leveraging deep learning and attention mechanism. IEEE Access 2024, 12, 37418–37431. [Google Scholar] [CrossRef]
  17. Han, Z.; Zhang, C.; Fu, H.; Zhou, J.T. Trusted multi-view classification with dynamic evidential fusion. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 45, 2551–2566. [Google Scholar] [CrossRef]
  18. Makhnytkina, O.; Matveev, A.; Bogoradnikova, D.; Lizunova, I.; Maltseva, A.; Shilkina, N. Detection of toxic language in short text messages. In Proceedings of the International Conference on Speech and Computer, St. Petersburg, Russia, 7–9 October 2020; pp. 315–325. [Google Scholar]
  19. Song, G.; Huang, D.; Xiao, Z. A study of multilingual toxic text detection approaches under imbalanced sample distribution. Information 2021, 12, 205. [Google Scholar] [CrossRef]
  20. Prudhvish, N.; Nagarajan, G.; Bharath Kumar, U.; Harsha Vardhan, B.; Tharun Kumar, L. DeTox: A WebApp for Toxic Comment Detection and Moderation. In Proceedings of the 2024 International Conference on Trends in Quantum Computing and Emerging Business Technologies, Pune, India, 22–23 March 2024; pp. 1–5. [Google Scholar]
  21. Warner, M.; Strohmayer, A.; Higgs, M.; Coventry, L. A critical reflection on the use of toxicity detection algorithms in proactive content moderation systems. Int. J.-Hum.-Comput. Stud. 2025, 198, 103468. [Google Scholar] [CrossRef]
  22. Heung, S.; Jiang, L.; Azenkot, S.; Vashistha, A. “Ignorance is not Bliss”: Designing Personalized Moderation to Address Ableist Hate on Social Media. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, Yokohama, Japan, 26 April–1 May 2025; pp. 1–18. [Google Scholar]
  23. Warford, N.; Farber, N.; Mazurek, M.L. How entertainment journalists manage online hate and harassment. In Proceedings of the Twentieth Symposium on Usable Privacy and Security (SOUPS 2024), Philadelphia, PA, USA, 11–13 August 2024; pp. 279–295. [Google Scholar]
  24. Mishra, S.; Chatterjee, P. Exploring chatgpt for toxicity detection in github. In Proceedings of the 2024 ACM/IEEE 44th International Conference on Software Engineering: New Ideas and Emerging Results, Lisbon, Portugal, 14–20 April 2024; pp. 6–10. [Google Scholar]
  25. Li, L.; Fan, L.; Atreja, S.; Hemphill, L. “HOT” ChatGPT: The promise of ChatGPT in detecting and discriminating hateful, offensive, and toxic comments on social media. ACM Trans. Web 2024, 18, 30. [Google Scholar] [CrossRef]
  26. He, X.; Zannettou, S.; Shen, Y.; Zhang, Y. You only prompt once: On the capabilities of prompt learning on large language models to tackle toxic content. In Proceedings of the 2024 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 20–22 May 2024; pp. 770–787. [Google Scholar]
  27. Abbasi, A.; Javed, A.R.; Iqbal, F.; Kryvinska, N.; Jalil, Z. Deep learning for religious and continent-based toxic content detection and classification. Sci. Rep. 2022, 12, 17478. [Google Scholar] [CrossRef]
  28. Dessì, D.; Recupero, D.R.; Sack, H. An assessment of deep learning models and word embeddings for toxicity detection within online textual comments. Electronics 2021, 10, 779. [Google Scholar] [CrossRef]
  29. Lima, L.H.Q.; Pagano, A.S.; da Silva, A.P.C. Toxic Content Detection in online social networks: A new dataset from Brazilian Reddit Communities. In Proceedings of the 16th International Conference on Computational Processing of Portuguese, Santiago de Compostela, Spain, 12–15 March 2024; pp. 472–482. [Google Scholar]
  30. Babakov, N.; Logacheva, V.; Panchenko, A. Beyond plain toxic: Building datasets for detection of flammable topics and inappropriate statements. Lang. Resour. Eval. 2024, 58, 459–504. [Google Scholar] [CrossRef]
  31. Tangwaragorn, P.; Kar, W. The implications of account suspensions on online discussion platforms. Decis. Support Syst. 2025, 189, 114389. [Google Scholar] [CrossRef]
  32. Asif, M.; Al-Razgan, M.; Ali, Y.A.; Yunrong, L. Graph convolution networks for social media trolls detection use deep feature extraction. J. Cloud Comput. 2024, 13, 33. [Google Scholar] [CrossRef]
  33. Rupapara, V.; Rustam, F.; Shahzad, H.F.; Mehmood, A.; Ashraf, I.; Choi, G.S. Impact of SMOTE on imbalanced text features for toxic comments classification using RVVC model. IEEE Access 2021, 9, 78621–78634. [Google Scholar] [CrossRef]
  34. Zhan, Z.; Mao, X.; Liu, H.; Yu, S. STGL: Self-Supervised Spatio-Temporal Graph Learning for Traffic Forecasting. J. Artif. Intell. Res. 2025, 2, 1–8. [Google Scholar] [CrossRef]
  35. Li, P.; Gao, J.; Zhang, J.; Jin, S.; Chen, Z. Deep Reinforcement Clustering. IEEE Trans. Multimed. 2022, 25, 8183–8193. [Google Scholar] [CrossRef]
  36. Zhang, W.; Wang, J. English Text Sentiment Analysis Network based on CNN and U-Net. J. Sci. Eng. 2024, 1, 13–18. [Google Scholar] [CrossRef]
Figure 1. The overall architecture of the proposed method. It utilizes BiLSTM and BERT to generate view-specific representations from sustainable development insight texts, i.e., z and h. Next, it performs entropy-oriented invariant learning between z and h to learn consistent information, and then constructs the Dirichlet network to implement the Dirichlet function via θ k = v k + 1 , which obtains the uncertainty estimations u l and u b of pattern decisions and confidence levels e k l and e k v for each category in each view. Finally, it utilizes the evidence-based information fusion to produce a final pattern decision of toxic text detection.
Figure 1. The overall architecture of the proposed method. It utilizes BiLSTM and BERT to generate view-specific representations from sustainable development insight texts, i.e., z and h. Next, it performs entropy-oriented invariant learning between z and h to learn consistent information, and then constructs the Dirichlet network to implement the Dirichlet function via θ k = v k + 1 , which obtains the uncertainty estimations u l and u b of pattern decisions and confidence levels e k l and e k v for each category in each view. Finally, it utilizes the evidence-based information fusion to produce a final pattern decision of toxic text detection.
Mathematics 13 02136 g001
Figure 2. ACC on two datasets using different methods.
Figure 2. ACC on two datasets using different methods.
Mathematics 13 02136 g002
Figure 3. Parameter analysis of α and β on the SustODS-PT dataset.
Figure 3. Parameter analysis of α and β on the SustODS-PT dataset.
Mathematics 13 02136 g003
Figure 4. Parameter analysis of learning rate r on the SustODS-PT dataset.
Figure 4. Parameter analysis of learning rate r on the SustODS-PT dataset.
Mathematics 13 02136 g004
Figure 5. Nemenyi test on the SustODS-PT and ToxiLuso-EC datasets.
Figure 5. Nemenyi test on the SustODS-PT and ToxiLuso-EC datasets.
Mathematics 13 02136 g005
Table 1. Comparison results on two datasets across four metrics. Bold indicates the best results.
Table 1. Comparison results on two datasets across four metrics. Bold indicates the best results.
MethodSustODS-PTToxiLuso-EC
PrecisionRecallF1AccPrecisionRecallF1Acc
GCNN-TTD0.54570.53860.54510.56890.59970.59120.59360.6002
RCNN-TTD0.52740.52240.52490.54870.55470.55550.58800.5786
TLD0.47110.50910.49770.47210.47770.50850.48800.4786
ToXCL0.69920.71770.73110.69630.64900.59050.62670.5782
TTD0.72720.72930.73700.73540.75900.74120.74460.7457
RTLD0.70830.73020.73970.71760.66110.59820.64240.6049
MMTLD0.53470.53100.54800.55520.45360.44740.47080.4808
TBTLD0.60110.62760.63000.59600.52310.53770.54790.5320
FFDC0.69790.69870.70880.71200.62520.61630.64840.6134
ALD0.72390.72430.73160.73640.74610.73320.75790.7520
CA-MTL0.71650.70940.70860.71260.74440.74560.75220.7489
Ours0.76200.75840.76320.75480.79740.78160.78520.7840
Table 2. Ablation study of distinct loss on the SustODS-PT dataset.
Table 2. Ablation study of distinct loss on the SustODS-PT dataset.
ComponentVariantPrecisionRecallF1-ScoreAcc
Single-view L h 0.49320.50650.47150.4985
L d 0.65120.66410.64340.6575
Dual-view L h + L d 0.70870.69770.68980.7022
L h + L f 0.65900.64820.64240.6513
L d + L f 0.74440.73240.73610.7365
L h + L d + L f 0.76200.75840.76320.7548
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Fan, W.; Song, H.; Zhang, J. A Novel Trustworthy Toxic Text Detection Method with Entropy-Oriented Invariant Representation Learning for Portuguese Community. Mathematics 2025, 13, 2136. https://doi.org/10.3390/math13132136

AMA Style

Fan W, Song H, Zhang J. A Novel Trustworthy Toxic Text Detection Method with Entropy-Oriented Invariant Representation Learning for Portuguese Community. Mathematics. 2025; 13(13):2136. https://doi.org/10.3390/math13132136

Chicago/Turabian Style

Fan, Wenting, Haoyan Song, and Jun Zhang. 2025. "A Novel Trustworthy Toxic Text Detection Method with Entropy-Oriented Invariant Representation Learning for Portuguese Community" Mathematics 13, no. 13: 2136. https://doi.org/10.3390/math13132136

APA Style

Fan, W., Song, H., & Zhang, J. (2025). A Novel Trustworthy Toxic Text Detection Method with Entropy-Oriented Invariant Representation Learning for Portuguese Community. Mathematics, 13(13), 2136. https://doi.org/10.3390/math13132136

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop