Knowledge-Guided Heterogeneous Graph Convolutional Network for Aspect-Based Sentiment Analysis

: The purpose of aspect-based sentiment analysis (ABSA) is to determine the sentiment polarity of aspects in a given sentence. Most historical works on sentiment analysis used complex and inefficient methods to integrate external knowledge. Furthermore, they fell short of completely utilizing BERT’s potential because when trying to generate word embeddings, they merely averaged the BERT subword vectors. To overcome these limitations, we propose a knowledge-guided heterogeneous graph convolutional network for aspect-based sentiment analysis (KHGCN). Specifically, we consider merging subword vectors utilizing a dynamic weight mechanism in the BERT embedding layer. Additionally, heterogeneous graphs are constructed to fuse different feature associations between words, and graph convolutional networks are utilized to identify context-specific syntactic features. Furthermore, by embedding a knowledge graph, the model can learn additional features from sources other than the corpus. Based on this knowledge, it is consequently possible to obtain more knowledge representation for a particular aspect by utilizing the attention mechanism. Last but not least, semantic features, syntactic features, and knowledge are dynamically combined using feature fusion. Experiments on three public datasets demonstrate that our model achieves accuracy rates of 80.87%, 85.42%, and 91.07%, which is an improvement of more than 2% compared to other benchmark models based on HGCNs and BERT.


Introduction
Aspect-based sentiment analysis (ABSA) is a crucial factor in sentiment analysis and has become an increasingly popular subject in natural language processing research [1,2].ABSA explores the aspect vocabulary's sentiment polarity (positive, neutral, or negative), given the sentence and the aspect vocabulary.As an example, when someone states "Great food, but the environment is so bad!", the sentiment polarities for the aspects of food and environment are opposite, as seen in Figure 1.In this regard, ABSA outperforms sentence-level sentiment analysis in determining a certain aspect's polarity [3].Neural networks have been utilized in a majority of the initial ABSA investigations to extract sentiment information associated with specific aspects within a textual context [4][5][6][7].Subsequently, ABSA model architectures using an attention-based mechanism [8] and a pre-trained model [9] have become popular approaches [10,11].BERT is instrumental in transforming input text into a more nuanced semantic representation.Simultaneously, the attention mechanism is crucial for highlighting a sentence's pertinent viewpoint that pertains to a particular aspect, focusing on contextual relationships.Despite the fact that models founded on pre-training and attention mechanisms have demonstrated commendable classification accuracies in ABSA tasks, a notable limitation arises in the simplistic methodology of averaging subword vectors to create word-level embeddings in the application of BERT [12,13].This approach potentially restricts the semantic representation capability of BERT.
Numerous studies have underscored that in the context of ABSA, it is important to consider both the syntactic dependencies and semantic interactions between aspect and context words [14,15].These studies highlight the significance of the syntactic dependency tree, which encapsulates syntactic information and is encoded using a graph convolutional network (GCN).This encoding effectively bridges aspect words with corresponding opinion words in a syntactically coherent manner.However, the use of a solitary dependency tree graph presents a limitation.It does not fully harness the latent information embedded within a sentence, nor does it capitalize on the robust feature fusion capacity inherent in GCNs.Modeling the enhanced syntactic dependency tree by adding additional nodes (such as sentences or knowledge) through a heterogeneous graph convolutional network (HGCN) can effectively improve the problem of a single structured dependency tree [16].Despite these advancements, most models employing HGCNs encounter a critical limitation: the vectors for aspect words and context words generated by the intermediate hidden layers are typically averaged to form explicit representations of the additional nodes [17,18].This procedure unintentionally results in an absence of potentially significant sentiment features, thereby impinging upon the efficacy and performance of the ABSA model.
Knowledge graphs have emerged as a powerful tool for infusing external knowledge into neural network models, significantly augmenting the ability of these models to comprehend semantic textual information.This integration of external knowledge is particularly beneficial in enhancing semantic features within ABSA tasks.Researchers have employed external knowledge to enrich the semantic dimensions of these tasks, primarily by utilizing words (identified as aspect nodes within sentences) in the knowledge graph as foundational seed nodes.These seed nodes establish connections with context nodes within the graph.Despite the substantial performance improvements achieved through these methods [19][20][21], it is posited that they do not entirely leverage the full spectrum of features offered by external knowledge.A critical concern is that potential features may be lost during the process of transposing them into the graph as nodes.Additionally, the construction of a knowledge subgraph for each sentence tends to be a complex and intricate task.
To address the issues stated above, we propose a new model: the KHGCN.The encoding layer involves feeding a concatenation of the context and aspect vocabulary into the coding layer.The model obtains semantic and syntactic knowledge through BiLSTM and HGCN layers, respectively.To further enrich the constructed graph features, sentiment external knowledge is introduced during the graph construction phase.The knowledge embedding matrices for context and aspect are obtained through low-dimensional continuous embedding.These embeddings are then integrated with the semantic representations derived from the BiLSTM layer.Subsequently, aspect-oriented knowledge representation is obtained through an attention mechanism.The semantic, syntactic, and knowledge feature representations thus captured are subsequently fused through a feature fusion layer.This fusion facilitates the prediction of sentiment classification, leveraging the combined strengths of semantic understanding, syntactic structure, and external knowledge integration.The primary contributions of our work can be summarized as follows: (1) We propose a new knowledge-guided heterogeneous graph convolutional network for aspect-based sentiment analysis.Through the utilization of BiLSTM, HGCN, and external knowledge, the model incorporates multifaceted features of semantics, syntax, and additional knowledge.(2) A dynamic weighting mechanism is proposed to address the underutilization of BERT and the inconsistency between BERT and GCN disambiguation in previous ABSA tasks.Sentence and aspect nodes, as well as their connection weights, are explicitly defined and enhanced with external sentiment knowledge when constructing the heterogeneous graph.(3) We also introduce external affective knowledge in a different manner, obtaining knowledge embeddings for both the aspect and context to individually capture affective information corresponding to specific aspects.
The remaining portions of this paper are structured as follows.Section 2 summarizes some previous research that is relevant to our work.Section 3 outlines the structure of our model: the KHGCN.Section 4 presents and explains the experimental results.Section 5 summarizes our findings.

Aspect-Based Sentiment Analysis
In the realm of ABSA, current methodologies primarily employing neural networks begin by analyzing the contextual information within a text.These methods concentrate on identifying crucial emotional cues to ascertain the polarity corresponding to particular aspect concepts [4][5][6][7]22,23].With the goal of providing an accurate aspect representation, Majumder et al. [4] utilized the memory network to add specific information close to the aspect words.Gandhi et al. [6] used conditional random fields and bi-directional LSTM to extract aspectual terms from text and model their sentiment.Utilizing an attention-based approach, a multi-granular attention mechanism was implemented by Zhu et al. [24] with the aim of enhancing the dependence among aspects and words of opinion.Additionally, Xue and Li [22] utilized a gating mechanism in a gated CNN model to output sentiment information exclusively, in accordance with a specified aspect.To further restrict the adverse effects of words unrelated to aspect and constrain the propagation of information, Zhao et al. [25] constructed an aspect-oriented weighting mechanism and proposed a GCN model incorporating multiple weighting techniques.

Graph Convolutional Network
There is a growing recognition of the limitations inherent in sequential models, particularly their neglect of syntactic relationships between sentences.The significance of syntactic relationships in comprehending and assessing sentiments in ABSA makes this error noteworthy.Graph network-based ABSA models have been developed at an impressive rate in the past few years, showing heightened potential for extracting and interpreting syntactic relationships [26][27][28][29].A heterogeneous graph neural network was proposed by Zhang et al. [30], which enriches the created graph characteristics with information from various node types and connectivity interactions.The model has shown good application in link prediction [31], node classification [32], and personalized recommendation [33].A heterogeneous graph convolutional network sentiment classification model was presented by Zhang et al. [18].It prunes the dependency tree and lessens the effects of plume knowledge regarding the outcomes.Therefore, we believe that it is very meaningful to utilize an HGCN for ABSA.

Considering External Knowledge
Deep learning models are progressively incorporating outside knowledge, especially in the realm of natural language processing [34][35][36][37].This trend underscores the significant role that both linguistic and general knowledge play in enhancing the comprehension of natural language.In the context of ABSA, where the primary objective is to analyze and interpret sentiment, the incorporation of external sentiment knowledge into models is especially advantageous.SenticNet7 [37] is an excellent public resource for categorizing sentiment and mining opinions and has performed very well in sentiment analysis tasks [38].Liang et al. [39] employed a SenticNet-enhanced dependency graph and utilized contextual sentiment knowledge to enhance sentiment categorization efficiency.In constructing the graph, Xu et al. [40] incorporated sentiment knowledge and amalgamated information from the latent semantic graph and the enhanced dependency graph dynamically by employing a gating mechanism.By incorporating additional sentiment nodes into the heterogeneous graph, Zhang et al. [18] calculated the similarity among nodes of aspect and nodes of knowledge to determine the connection weights.When introducing sentiment knowledge, the above model simply calculates the sentiment score values of related words to enhance the dependency graph and does not fully utilize the sentiment vector representations in SenticNet.Thus, based on the rich sentiment knowledge in SenticNet, for aspect words and context, we extract an embedded matrix of the sentiment information by utilizing the 300,000-concept affective knowledge space.

Limitations
The related works mentioned above simply average the subword vectors to form word-level embeddings when implementing BERT, which limits the powerful semantic representation capability of BERT and also affects the performance of the ABSA model.Furthermore, the previous works do not comprehensively consider semantic, syntactic, and external knowledge features.There are very few studies that consider all three features at the same time.When conducting syntactic feature extraction, isomorphic graphs alone do not fully utilize the powerful feature fusion capabilities of the GCN.It is also necessary to account for the impact of additional nodes on the syntactic structure.Additionally, the incorporation of external knowledge is too simple to fully exploit the rich sentiment information in the affective space.

Methodology
We present a detailed illustration of the construction of heterogeneous graphs and the KHGCN in this section.In Figure 2, an overview of the proposed model is shown.

Problem Description
The purpose of ABSA is to determine the sentiment polarity of aspects in a given sentence.Assuming a sentence containing n words, S = {w 1 , w 2 , . . ., w n−1 , w n }, and its subsequence of A = {w τ , w τ+1 , . . ., w τ+m−1 } represents an aspect in S. In this task, the aspect terms are labeled as Y = {0, 1, 2}, representing negative, neutral, and positive aspects, respectively.

Embedding Based on BERT
Word embeddings are acquired by applying BERT, which encodes each component word into a high-dimensional vector.BERT [9] has demonstrated notable efficacy in the domain of contextual representation learning.The sequence of inputs is constructed in the form of [

Dynamic Weighting Mechanism
Inconsistent tokens arise from the differences in word segmentation methods between BERT and the GCN [41,42].Take, for example, the following sentence: "He hates playing games".BERT generates tokens such as "[CLS]", "He", "hates", "play", "##ing", and "games", ".", "[SEP]".However, the GCN generates tokens such as "He", "hates", "playing", "games", and ".".This is due to the WordPiece word segmentation strategy in BERT that splits "playing" into two subwords: "play" and "##ing".The importance of different subwords within a text surely fluctuates.For the purpose of addressing the problem of inconsistent word segmentation and improving the effectiveness of the BERT pre-trained model, we introduce a dynamic weighting mechanism to amalgamate subwords, as delineated in Algorithm 1.For an input sequence, we first obtain the index sequence of The corresponding embedding vector should not change if the word is divided consistently in both BERT and the GCN.When the subwords are divided inconsistently between the two, the subword weight within the original word is determined utilizing the exponential function.Next, the subword weights are normalized and weighted to obtain the embedding vectors of the words.i = i + 1 18: end for

BiLSTM Layer
BiLSTM is commonly applied to text and time-series data.It can be used to capture token context-dependent information, including forward and backward LSTM [6].
After the embedding layer, we obtain the context's embedding vector matrix E s = {e s 1 , e s 2 , . . ., e s n }, as well as the aspect word's embedding vector matrix E a = {e a τ , e a τ+1 , . . ., e a τ+m−1 }.These matrices are then input into independent BiLSTM layers.The context feature representation corresponds to the hidden state H s = {h s 1 , h s 2 , . . ., h s n }, where H s ∈ n×2d h , and the representation of the corresponding aspect is H a = {h a τ , . . ., h a τ+m−1 }, where H a ∈ m×2d h , and d n is the hidden layer dimension of BiLSTM.

Detailed Description of the Heterogeneous Graph
G(V, E, T v , T e , ϕ, ψ) is the representation of a heterogeneous graph, where V represents a node set and E represents an edge set.The mapping of each node to its appropriate node type is represented by ϕ: ϕ v : V → T v .ψ represents mapping each node to its corresponding edge type: ψ e : E → T e .In our work, each word in a sentence {w 1 , w 2 , . . ., w τ+m−1 , . . ., w n } is a node in the graph, and |T v | = 3, |T e | = 4. Apart from word nodes, our framework incorporates two supplementary node categories: aspect nodes, denoted as t, and sentence nodes, denoted as s.Below, we sequentially delineate the pertinent node types alongside the inter-nodal relational structures: (1) DT ij : Employing the spaCy toolkit, we ascertain the syntactic dependencies inherent within each proffered sentence.The syntactic dependency tree's interrelations serve as the edges, whereas the associated lexical tokens constitute the nodes, thereby forging the connective topology.
We incorporate knowledge from SenticNet and integrate emotional knowledge into syntactic dependencies using the sentiment scores from SenticNet by making use of the affective common sense information between aspect words and context.Initially, we determine each lexical group's sentiment score based on syntactic dependencies: where Sentic(w i ) ∈ [−1, 1] represents the sentiment score in SenticNet.Conforming to the processing methodology delineated in [39], we derive the final syntactic matrix as follows: (2) TF-IDF: The TF-IDF metric is utilized to determine the relative importance of every word within a document or corpus.The model has the capacity to decrease the impact of words that contribute only slightly to the meaning of the sentence by employing the TF-IDF metric to calculate the connective weight connecting a word with a sentence.Consequently, this facilitates a model-centric emphasis on terms that furnish more substantive contributions.
(3) The mutual indication relationship between aspects and sentences, which has been shown to perform well in the ABSA task, is represented by an edge across the aspect node t and the sentence node s.
(4) DW I N ij aspect : This illustrates how word nodes affect aspect nodes' sentiment polarity and how aspect nodes affect word nodes.Typically, a word node's influence on an aspect increases with its closeness to the aspect.Thus, by adjusting the window size, you can determine whether and to what degree the word node affects the aspect node.The window is dynamically set in this article according to the length of the input text.The precise implementation approach, along with its magnitude and degree of influence, is as follows: where n is the text length, k controls the window size, β is the attenuation factor that controls the degree of contribution, and k and β are hyperparameters, which can be set separately according to different datasets.
We represent the relationships in this heterogeneous graph utilizing the adjacency matrix A ∈ (n+2)(n+2) .In mathematical terms, the weight of the edge is determined as follows:

Graph Convolutional Networks
Through the multiple relationships mentioned above, given a sentence, we can construct a heterogeneous graph G(V, E, T v , T e , ϕ, ψ, A), where A represents the adjacency matrix.To create a new node representation vector, H ′ = {h s 1 , h s 2 , . . ., h s n , s, t}, the representation vectors for the additional nodes, s and t, are combined with the context node vector.Subsequently, the heterogeneous graph adjacency matrix A alongside the nodal vectors H ′ are assimilated as inputs to the GCN, wherein the graph convolution operation is harnessed to encode the locoregional nodal information.Through the graph convolution process, each node incrementally accrues efficacious information from its immediate neighbors with each convolutional iteration.By augmenting the number of graph convolution layers, a node's capacity to garner a broader informational ambit from its neighboring nodes is enhanced, thereby enabling the efficacious learning of node-specific feature representations.Drawing upon the insights of extant scholarship [43,44], we posit that a monolithic layer in a GCN is insufficient for the acquisition of a comprehensive informational spectrum from neighboring nodes.Conversely, a multi-layered GCN approach inflates the model's complexity.Consequently, our work advocates for a two-layer GCN.
Graph convolution is utilized to update each node's representation in a GCN layer.The detailed equations are as follows: where D represents the degree matrix, c i represents the normalization constant, and W l and b l represent learnable parameters.

Aspect-Specific Mask
The enhanced aspect feature vector can be acquired by applying the zero-mask layer following the GCN layer.Only the aspect nodes' feature representation remains after the mask adjusts the representation of non-aspect nodes to 0.

Affective Knowledge Graph
The smallest significant units of language are referred to as words.However, many words or phrases themselves have profound personal significance.For example, the word "disastrously" typically conveys a negative sentiment in a sentence.We present a new approach that utilizes the embedding of knowledge and introduces the affective knowledge space from SenticNet7 [37] to better leverage the rich sentimentality of the words themselves.
SenticNet7 constructs a three-level concept primitive knowledge representation, as shown in Figure 3, by first extracting concept primitives from the text and then connecting the concept primitives, the public knowledge concept layer, and the command entity layer.The 300,000 concepts inside SenticNet7.In addition to providing conceptual-level representation, assigning semantics and sentiment provides semantic information and sentiment for individual words.The affective knowledge space maintains the relationship between semantics and sentiment by mapping the SenticNet7 concepts to the continuous low-dimensional embeddings.Consequently, we are able to apply the matrix E A f f to obtain the affective knowledge embeddings of the input text S and aspect words A.
Since H A f f is a 100-dimensional embedding vector, we first transform it into the dimensions of the BiLSTM layer's hidden state utilizing a linear transformation.Then, we integrate affective knowledge into a feature representation vector matrix as the final knowledge graph's embedding through matrix addition: We obtain the external knowledge representation of the text utilizing this approach.With this foundation, we can employ external affective knowledge to efficiently determine the sentiment polarity of aspect words in a

Attention Layer
In this layer, to extract long-term dependencies from context and capture the interactivity of aspect words and context, we utilize a multi-head attention.Aspect-specific contextual information is updated through the aspect-aware attention mechanism.

Multi-Headed Attention
The self-attention mechanism is responsible for learning long-term dependencies from the input sequence.We first project the input sequence to the representations, Q , K , V, and then calculate the attention.The formula is as follows: where Q i , K i , V i represent the i-th attention head, W O is the learnable parameter of the multi-head attention output, and the remaining linear transformations are parameters that can be learned.

Aspect-Aware Attention Mechanism
To extract meaningful sentiment features from the representations of particular aspects, we implement an aspect-aware attention mechanism, which is consistent with the work in [45].In this research, we apply this mechanism sequentially to semantic, syntactic, and knowledge representations.The calculation procedure is detailed below.The calculations for semantic representation are as follows: where h s t and h a i are obtained through the BiLSTM layer.The calculations for syntactic representation are as follows: where h mask i is the aspect-specific output obtained through the zero-mask layer.The calculations for knowledge representation are as follows: where h s a f f ,t and h a a f f ,i are the knowledge embedding representations of S and A, respectively.

Feature Fusion
It would be difficult to advantage of the complementarity between the representations if they were immediately fused by gathering the corresponding representation from the embedding of semantics, syntactic, and affective knowledge.As a result, we use the fusion strategy described in [45].During the local fusion process, three types of features are first connected in pairs: [Z c ; Z s ], [Z c ; Z k ], and [Z s ; Z k ].Then, the connected representations are submitted to independent fully connected layers to yield the expected sentiment features: Z cs , Z ck , and Z sk .In order to achieve the sentiment prediction probability, we merge them globally after the first splicing by feeding them into a 3 × 3 convolutional layer.

Sentiment Classification
Ultimately, as the last sentiment prediction, we take the output of the feature fusion layer, ŷ , and utilize a cross-entropy loss as guidance for training: where y i is the real label, and ŷ is the label of the prediction.

Datasets and Experimental Details
Three public benchmark datasets-Lap14 [46], Rest15 [47], and Rest16 [48]-were utilized for our research.The statistics for each polarity of the dataset are shown in Table 1.
To show the datasets in more detail, the statistics on the sentence lengths are shown in Table 2. Most of the sentence lengths are distributed in the range of 10-30, and there are fewer sentences greater than a length of 50.A more intuitive distribution of the sentence lengths can be seen in Figure 4.In our experiments, the depth of the GCN layer was set to 2, the coefficient of the L2 regularization term was set to 0.00001, and α in Algorithm 1 was chosen to be 0.3.The hidden state vectors had a dimensionality of 768.The dimension of the word embeddings was adjusted to 768 based on the pre-trained BERT-base-uncased model, and the model parameters were optimized and updated using the Adam optimizer.We fixed the learning rate to 0.00002 and the batch size to 32, utilizing Xavier [49] to initialize the parameters.Dropout was then applied to the word embeddings, and the dropout rate was chosen to be 0.5 in order to prevent overfitting.The KHGCN model was trained on an RTX 3080Ti, 12 vCPU Intel(R) Xeon(R) Silver 4214R CPU @ 2.40 GHz, and the model was built using PyTorch 2.0.0.

Evaluation Metrics
In the present study, we employed the same accuracy (Acc.) and F1 score metrics from the previous ABSA model.Accuracy can be utilized to evaluate how effectively the model performs overall in terms of classification by quantifying the percentage of correctly categorized samples to all samples.In statistics, the accuracy of a two-classification model is shown by the F1 value.It considers the classification model's recall rate as well as accuracy.When processing data that are unbalanced by category, the Macro-F1 score is a highly beneficial metric because it automatically averages the F1 score for each category.TP represents the number of true positives, FP represents the number of false positives, FN represents the number of false negatives, and TN represents the number of true negatives.The formulas for these metrics are as follows: Recall = TP TP + FN (26)

Baseline
We contrasted our model with other baseline models used for the ABSA task, which are detailed below: • MemNet [50] employs a multi-hop architecture, context processing, and external memory.• AOA [51]: Adapts the attention-over-attention technique from machine translation to the realm of ABSA.• IAN [52]: Constructs representations for aspects and contexts independently after interactively learning attentions in aspects and contexts.• ASGCN [14]: Applies a GCN to sentence dependency trees to take advantage of syntactic information.• AEGCN [53]: Utilizes an improved GCN that combines phrase dependency trees with multi-head attention.• SK-GCN [19]: Utilizes a mixed syntax and external knowledge model that successfully integrates external knowledge with syntax information.• AHGCN [16]: Employs a new GCN-based model using heterogeneous graphs.• ADHGCN [18]: Utilizes a new heterogeneous graph construction method, which adds post-pruning on top of the traditional construction method.

•
Sentic LSTM [54]: Employs a common-sense knowledge solution for directed sentiment analysis of aspect words.• Sentic-GCN [39]: Incorporates SenticNet into a model that makes full use of syntactic relationships and sentiment common sense to enhance dependency trees.• DM + GCN + BERT [55]: Performs dynamic and multi-channel GCN modeling of syntactic and semantic information in sentences.• SGGCN + BERT [56]: Alters the graph-based model's hidden vectors to make the most of information from the aspects.• AIEN + BERT [57]: Constructs an interaction encoder using a GCN and attention mechanisms for extracting interaction features.• KHGCN: Utilizes a dynamic weighting mechanism to acquire word-level embeddings during the encoding phase.It employs BiLSTM, an HGCN, and affective space to obtain semantic, syntactic, and external sentiment features, respectively.Additionally, it utilizes an attentional mechanism to extract features for sentiment prediction.

Performance Comparison
The experimental results, as indicated in Table 3, demonstrate that our proposed KHGCN performed better overall compared to all other models examined on the three benchmark datasets.
In particular, the proposed KHGCN performed noticeably better than earlier homomorphic and heteromorphic graph models based on GCNs (ASGCN, AEGCN, SK-GCN, AHGCN, andADHGCN), verifying the effectiveness and feasibility of utilizing emotional knowledge encoding as well as heteromorphic graphs for sentence dependency enhancement.Poorer results were achieved for the macro-F1 scores on the Rest16 dataset compared to the excellent GCN-based model (Sentic-GCN).In addition, for the BERT-based model, our proposed KHGCN outperformed the vanilla BERT model on the three public datasets.Although it was not as competitive as the SGGCN + BERT model on the Lap14 dataset, both the Acc. and macro-F1 values exhibited significant improvements over this model on the Rest15 and Rest16 datasets, demonstrating the effectiveness of the dynamic weighting mechanism we proposed in the BERT encoding stage.The results show that our proposed knowledge-guided heterogeneous graph convolutional network approach is effective.

Ablation Study
The detailed results of the ablation experiments are shown in Table 4.The performance of the model without sentiment knowledge encoding (w/o Z k ) was unsatisfactory on all datasets, which indicates that our way of encoding sentiment knowledge provided the corresponding sentiment information for the contexts and aspects.Meanwhile, after removing the semantic and syntactic branches (w/o Z c , w/o Z s ), the model's performance decreased by varying degrees.We can conclude that the BiLSTM and HGCN layers were effective components for extracting semantic and syntactic features.In addition, the experiment without DWM yielded worse experimental results overall compared to the benchmark tests, again confirming that the integration of encoded information using the dynamic weighting mechanism in the BERT encoding phase was effective.

Parameter Experiment
Since the KHGCN model uses a dynamic window mechanism to construct the heterogeneous graphs, we investigated how this mechanism's hyperparameters, k and β, impacted the model's performance, as illustrated in Table 5.The optimal Acc. and macro-F1 values on the Lap14 and Rest16 datasets were achieved with k = 0.3 and β = 0.3, the optimal Acc.value on Rest15 was achieved with k = 0.3 and β = 0.4, and the optimal macro-F1 value on Rest15 was achieved with k = 0.2 and β = 0.6.The reason for this result may be that in the Rest15 dataset, there are large differences in the data distribution models of different categories.

Complexity Analysis
In this subsection, we have chosen AHGCN, Sentic-GCN, and AIEN+BERT to compare the number of parameters and the runtimes of the models, and the results are shown in Table 6.Since the data pre-processing, heterogeneous graph generation, and external knowledge embedding only need to be performed once, the generated embedding matrix can be recycled.Therefore, this part was ignored when counting the number of parameters and the runtimes of the models.As we can observe from Table 6, the number of parameters in our model was significantly higher than those in AHGCN and Sentic-GCN, and slightly lower than those in AIEN+BERT.In terms of time complexity, we were able to reach convergence faster on the Rest16 dataset, and the runtimes of the different models for the remaining two datasets were not significantly different.Thus, the model in this paper sacrifices space complexity while performing better in terms of the Acc. and macro-F1 metrics.

Discussion
The proposed model performs well, as can be seen from the comparison of the experimental results above.Table 4 shows that feature extraction modules, whether semantic, syntactic, or external information, play an invaluable role in text feature extraction.This demonstrates that the ABSA model can incorporate multiple simultaneous feature extraction modules for text, which is beneficial for enhancing classification accuracy.This paper proposes a dynamic weighting mechanism for using BERT to embed words to address the problem of inconsistency between the GCN model and the BERT model for word segmentation.The experimental results also verify the effectiveness of this method, which provides a solvable idea for subsequent research.Although external knowledge embedding introduces additional complexity, it is a one-time process, and the resulting embeddings can be reused.Furthermore, the aforementioned modules can be utilized independently as branches of the ABSA model for feature extraction, which is highly scalable, providing greater convenience for subsequent researchers.The model proposed in this paper can be beneficial in domains that require sentiment analysis, such as online shopping, where sellers can analyze the likes and dislikes of various types of products through customer reviews and better adjust the shelving strategy of the products.Another area is social media monitoring, which can help companies track user sentiment on social platforms and understand the public's views on specific issues, products, or events.

Case Study
In this section, to better analyze how decisions are made within the KHGCN, the specific decision-making process is presented using the test example "The staff should be a bit more friendly".The results are shown in Figure 5.In the text semantic extraction stage, the model notices the key negative expression "should be".In the semantic structure stage, the model focuses more on the connection between the sentence nodes and the aspect word "staff".Simultaneously, the model learns the words with obvious emotional information, such as "bit" and "friendly", through external knowledge embedding.By combining the above three stages of sentiment extraction, the KHGCN accurately predicts the sentiment polarity of the aspect "staff".

Conclusions
In response to the complexity and inefficiency of the traditional literature on integrating external knowledge and generating word embeddings, which often merely average BERT subword vectors, we propose a knowledge-guided heterogeneous graph convolutional network for aspect-based sentiment analysis to address these limitations.In concrete terms, we propose a dynamic weight mechanism for merging subword vectors in the BERT embedding layer.In addition, the model can acquire additional information by embedding the knowledge graph, signifying its ability to utilize the attention mechanism to generate aspect-oriented knowledge representations.And finally, feature fusion is utilized to dynamically combine the feature representations of semantic, syntactic, and knowledge aspects.The experimental results demonstrate that our proposed KHGCN performs better overall on three benchmark datasets compared to all other models examined.

Figure 1 .
Figure 1.ABSA of the statement "Great food, but the environment is so bad!".

Figure 2 .
Figure 2. The detailed architecture of the KHGCN.
for j = 0 to len(L(i)) − 1 do 13: weight[j] = e −αj where e is a natural constant and α is a decay factor 14:

Figure 3 .
Figure 3.A sketch of the SenticNet 7 semantic network of three-level knowledge representation.

Figure 4 .
Figure 4. Histogram of sentence length statistics on the three datasets.
Label: Negative Predict: Negative

Table 1 .
Statistics of the datasets.

Table 2 .
The length distributions of the datasets.

Table 3 .
Performance comparison.(The best result for each model is shown in bold, whereas the second-best result is underlined).

Table 5 .
Performance using different values k and β.

Table 6 .
Runtime and number of parameters of the model (time represents the time for each epoch to achieve convergence).