You are currently on the new version of our website. Access the old version .
AxiomsAxioms
  • Article
  • Open Access

9 July 2025

Quantum-Inspired Attention-Based Semantic Dependency Fusion Model for Aspect-Based Sentiment Analysis

,
,
,
,
and
School of Computer Science, Xi’an Polytechnic University, Xi’an 710600, China
*
Authors to whom correspondence should be addressed.

Abstract

Aspect-Based Sentiment Analysis (ABSA) has gained significant popularity in recent years, which emphasizes the aspect-level sentiment representation of sentences. Current methods for ABSA often use pre-trained models and graph convolution to represent word dependencies. However, they struggle with long-range dependency issues in lengthy texts, resulting in averaging and loss of contextual semantic information. In this paper, we explore how richer semantic relationships can be encoded more efficiently. Inspired by quantum theory, we construct superposition states from text sequences and utilize them with quantum measurements to explicitly capture complex semantic relationships within word sequences. Specifically, we propose an attention-based semantic dependency fusion method for ABSA, which employs a quantum embedding module to create a superposition state of real-valued word sequence features in a complex-valued Hilbert space. This approach yields a word sequence density matrix representation that enhances the handling of long-range dependencies. Furthermore, we introduce a quantum cross-attention mechanism to integrate sequence features with dependency relationships between specific word pairs, aiming to capture the associations between particular aspects and comments more comprehensively. Our experiments on the SemEval-2014 and Twitter datasets demonstrate the effectiveness of the quantum-inspired attention-based semantic dependency fusion model for the ABSA task.

1. Introduction

ABSA focuses on identifying sentiment polarities towards specific aspects within a given text, enabling fine-grained understanding of user opinions. Unlike general sentiment analysis, ABSA requires models to precisely align sentiment expressions with their corresponding aspect terms. This task is inherently complex due to the variability and ambiguity in natural language. Sentiment-related information is often scattered across the sentence, and not all contextual words are equally relevant to the target aspect. In many cases, only a few context words directly influence the sentiment orientation for a given aspect, while others serve as noise or distractors. Accurately capturing the aspect-context alignment is essential for reliable sentiment classification.
With the growth of product reviews and social media, ABSA has been widely used in many practical applications to provide more detailed and valuable insights by analyzing sentiment at a granular level. Traditional approaches struggle with this granularity, and even early deep learning models often treat all words with equal or fixed importance. A more selective mechanism is therefore needed to align aspect terms with their corresponding expressions of opinion. In response, attention mechanisms have emerged as an effective solution, allowing models to dynamically focus on the most informative parts of the context with respect to each aspect. For example, Tang et al. [1] deeply analyzed the correlation between contextual vocabulary and aspect-specific and Long Short-Term Memory (LSTM) networks, and significantly optimized the performance of the model by introducing differential attention weights. These weights accurately reflect the relative importance of contextual vocabulary in particular aspects of sentiment judgement. Subsequently, numerous studies have utilized the attention mechanism to meticulously delineate the intricate relationships between aspect terms and contextual opinion words. Since then some researchers have also used syntactic dependency trees to obtain richer structural and syntactic information. Dependency trees, as a kind of graph-structured data, can be better used to capture semantic knowledge using Graph Neural Networks (GNNs). Therefore, most of the recent work [2,3] on dependency trees is related to GNNs. In the context of the ABSA task, several studies have employed Graph Convolutional Networks (GCNs) in conjunction with pre-trained language models to integrate semantic and syntactic information to varying extents. However, the neighborhood aggregation mechanism inherent in GCNs often results in the averaging or dilution of the contextual information associated with aspectual words. Additionally, when addressing dependency relationships to construct dependency tree models, each layer of GNNs updates the node representation through message passing from neighboring nodes. This computation scales linearly with both the size of the graph and the number of layers, resulting in a computational cost that is significantly higher than the dot product computation utilized in attention mechanisms. While GNNs offer certain advantages in explicitly modeling dependencies, their drawbacks—including high computational complexity, reliance on graph structure, and limited capacity to manage long-distance dependencies—render them less efficient and flexible than attention mechanisms in some scenarios. Furthermore, the interpretability of these classical neural network models is frequently called into question.
Since most of the models mentioned above use real values to represent their features, the sparsity of the weight distribution is also an issue to be considered. Since complex-valued vectors have richer representation capabilities, some quantum probability-driven networks have been proposed in recent years to model different levels of semantic units by extending word embeddings to complex-valued representations. Compared to real vectors, complex vectors inherently offer greater representational flexibility by encoding both magnitude and phase, which is especially beneficial for capturing contextual nuances and interference effects in language. This has led to growing interest in quantum-inspired approaches that naturally adopt complex-valued vector spaces for linguistic modeling. Drawing inspiration from quantum theory, such methods aim to provide a more structured and interpretable framework, where semantic units are not just embedded in continuous spaces but are also governed by probabilistic and superpositional principles. Based on this background, Li et al. [4] constructed a complex-valued network for question–answer tasks, and Chen et al. [5] proposed a quantum entanglement module to learn inseparable correlation information between word states, confirming the richness and interpretability of the entanglement theory for semantic feature encoding. Yu et al. [6] constructed quantum entanglement-based sentence representations entangling two consecutive conceptual words together, and proposed two dimensionality reduction models using numerical computation of tensor multiplication to mine more semantic information. In addition to quantum entanglement, quantum coherence is another central concept in quantum theory, which provides unique inspiration for language systems. It is a fundamental concept in quantum theory that not only characterizes the quantum properties of the single-partite quantum state but also describes the correlations among multipartite quantum states. If a system exhibits quantum entanglement, it necessarily possesses quantum coherence. Since quantum entanglement can be used to encode text strings and quantify inter-word correlations using entanglement entropy, can quantum coherence also quantify inter-word correlations?
Driven by the compatibility of quantum language models with complex-valued vectors, we investigate the relationship between quantum coherence and textual semantics in quantum physics to address the ABSA task. While considerable effort has been devoted to designing high-performance graph converters, there has been limited research on applying quantum-inspired theoretical methods to the ABSA task. Therefore, we propose a quantum-inspired attention-based semantic dependency fusion model (it will be referred to subsequently by the acronym QEDFM), which aims to construct superposition states and characterize and fuse textual semantics and utterance dependencies by introducing complex-valued embeddings. This approach seeks to model the diversity and complexity of textual inter-word subsystems and enhance the coding efficiency of textual features. To further improve contextual dependencies, we also introduce a Complex-Valued Quantum Cross-Attention Mechanism (QCA) that efficiently fuses dependencies, enabling the model to focus on those dependencies in utterances that are most relevant to aspect terms, thereby significantly reducing computational overhead.
Our contributions are as follows:
  • We propose a quantum embedding module to build complex semantic systems and, for the first time, quantify inter-word relations in terms of quantum coherence.
  • We propose a quantum cross-attention mechanism that emphasizes specific combinations of qubit or quantum states with dependencies, aiming to enhance the efficiency of feature fusion in text and its associated dependencies.
  • Through numerous experiments on the ABSA task, we demonstrate the effectiveness of QEDFM. Additionally, visualization experiments for the quantum embedding module offer an intuitive understanding of this coding approach. The relative entropy of coherence experiments further enhance the model’s interpretability.

3. Materials and Methods

Since our work is based on quantum theory, thanks to the research of Khrennikov [27] on quantum-like framework investigation and Alodjants et al. [28] on quantum-inspired user cognitive modeling process, we will introduce the knowledge of quantum theory in this section of the medium. Following this, we elaborate on our proposed model for the ABSA task. The overall structure of the QEDFM is illustrated in Figure 1. QEDFM consists of four components: sequence preprocessing, an embedding module, a dependency fusion module, and sentiment classification.
Figure 1. The structure of QEDFM.

3.1. Preliminaries on Quantum Theory

3.1.1. Quantum State

State vectors in quantum theory are defined on a Hilbert space H, which is a vector space with inner product operation, where the states of a quantum system are denoted as unit vectors [29,30]. We denote the complex unit vector u as the ket | u , its conjugate transpose u is denoted as a bra u | , and the inner and outer products of the two state vectors | u and | v are denoted as u | v and | u v | .
A quantum state | ψ is a complete description of a physical system and is a linear superposition of standard orthogonal bases in Hilbert space. The state of a system consisting of a single particle is called a pure state. The mathematical form of | ψ is a complex-valued column vector. A pure state can also be expressed as a density matrix: ρ = | ψ ψ | . When several pure states are mixed together in a classically probabilistic manner, we describe the system in terms of a mixed state. The density matrix can also represent a mixed state: ρ = i = 1 n p i | ψ i ψ i | , where p i denotes the probability distribution of each pure state, i = 1 n p i = 1 . Where the density matrix corresponding to the superposition state can be expressed as a weighted combination of the density matrices corresponding to the ground state. In this paper, a text sequence consists of a number of words, where each word can be viewed as a pure state, and this text sequence is a linear combination of these pure states.
Any quantum pure state can be described by a unit vector in H, which can be expanded over the ground state as
| q = a 1 | x 1 + + a n | x n
where { | x 1 , | x 1 , , | x n } are the standard orthogonal bases constituting the Hilbert space, The complex-valued probability amplitude is { a i } and satisfies i = 1 n a i 2 = 1 . The set { a i 2 } defines a classical discrete probability distribution; a quantum system can be in a superposition of different states at the same time, | x i , whose probability is given by a i 2 . If the quantum system is in Equation (1), then the system is physically in the { | x 1 , | x 1 , , | x n } superposition state.
For any quantum state ρ , its state needs to satisfy t r ( ρ ) = 1 , ρ = ρ , and ρ 0 . In this paper, we focus on quantum pure states, so the inputs and outputs of the neural network module are complex-valued vectors representing pure states.

3.1.2. Quantum Evolution

In quantum mechanics, the Schrödinger equation describes how a system changes over time. It performs this by relating changes in the state of the system to the energy in the system, which is given by operators called Hamiltonian quantities. However, the Schrödinger equation is often difficult to solve, so the unitary transformation is applied to the Hamiltonian quantities to achieve a solution to the original equation [29]. The evolution is described by a unitary operator U, which is a bounded linear operator defined on a Hilbert space at the same time as a complex unitary matrix, satisfying U U = I 2 , whose evolution is as follows:
ρ = U ρ U
where as long as ρ is a density matrix, then after unitary evolution, ρ is also a density matrix. We consider this evolution process as a linear transformation process of the density matrix in this paper.

3.1.3. Quantum Coherence

Quantum coherence, as a pivotal physical resource, plays a crucial role in quantum information processing. In 2014, M.B. Plenio et al. [31] proposed a theoretical framework for quantifying quantum coherence. Within this framework, the quantification of quantum coherence necessitates first fixing a set of orthonormal bases { | i } , i = 1 , , d in a d-dimensional Hilbert space. A quantum state is deemed incoherent if it can be expressed in the form of δ ^ = i = 1 d δ i | i i | ; otherwise, it is considered a coherence state. In the theoretical framework of quantum coherence as a resource, there exist various measures for quantifying quantum coherence. This paper solely introduces two such measures: quantum relative entropy of coherence and l1-norm of coherence.
For quantum relative entropy S ( ϱ σ ) = t r [ ϱ log ϱ ] t r [ ϱ log σ ] , the induced measure can be denoted as C r . In addition to satisfying all the requirements for a coherence measure within the theoretical framework, C r permits a closed-form solution, thus avoiding the need for minimization. Let δ = i δ i | i i | I , and for a given ϱ = i , j ϱ i , j | i j | , denote ϱ diag = i ϱ i , i | i i | . Then, S ( ϱ δ ) = S ϱ diag S ( ϱ ) + S ϱ diag δ , and hence,
C r ( ϱ ) = min δ I S ( ϱ , δ ) = S ϱ diag S ( ϱ )
Utilizing this formula, we can readily ascertain the maximum possible value of coherence for a given state. This holds true for any state ϱ that satisfies the condition C r ( ϱ ) S ϱ diag log ( d ) , where d is the dimensional Hilbert space, and it also provides the limit for the state of maximum coherence. This relative entropy measure has also been considered in the context of quantifying superposition.
For l1-norms, it serves as an intuitive measure of coherence that inherently relates to the off-diagonal elements of the considered quantum state. Therefore, it is desirable to quantify coherence through a functional that depends on these off-diagonal elements. C l 1 ( ϱ ) = i , j i j ϱ i , j presents a widely adopted terminology for quantifier of coherence. If it satisfies all the requirements for a consistent measure within the theoretical framework of coherence, it would constitute another intuitive coherence monotone with an easy closed form. It would be a measure induced by the l1 matrix norm, D l 1 ( ϱ , δ ) = ϱ δ l 1 = i , j ϱ i , j δ i , j . Thus, the l1 norm of coherence and the relative entropy of coherence constitute the most general monotonicity properties of coherence established in this paper.

3.1.4. Quantum Measurement

In quantum mechanics, the positive operator-valued measure [32] (POVM) removes a system state from uncertainty to a precise event by projecting a state to its certain corresponding basis state. The measurement process is described by an observable M:
M = i = 1 n λ i | m i m i |
where { | m i } is the eigenstate of the measurement operator and the orthonormal basis in Hilbert space. { λ i } is the eigenvalue corresponding to the eigenstate. According to Born’s law [33], the probability of collapse of the pure state | ψ to the ground state | m i is calculated as follows:
P i = m i | ψ 2 = t r ( ρ | m i m i | )
where ρ = | ψ ψ | . For a mixed state, the probability of collapsing to an eigenstate is the weighted sum of all pure state probability values. We use quantum measurements to compute the weights of utterance dependencies fused with textual features and identify the final sentiment.

3.2. Methodology

QEDFM consists of four components: sequence preprocessing, an embedding module, a dependency fusion module, and sentiment recognition. First, sentence dependency relations and sequential feature representations are extracted using various methods. Next, the sequential features are encoded into superposition states through the quantum embedding module, while the dependency relations are constructed into density matrices using complex-valued embeddings as input for the dependency fusion module. Subsequently, we employ a quantum cross-attention mechanism to treat the dependency relation features as an information source, allowing them to interact with the text sequence features. This approach facilitates bidirectional attention and information exchange between the dependency relation features and the text sequence features, thereby enhancing the model’s comprehension of sentence structure and semantics. Finally, we input the resulting aspect features into a classifier to recognize sentiments.

3.2.1. Sequence Preprocessing

For dependency parsing, we employ Stanford CoreNLP (https://stanfordnlp.github.io/CoreNLP, accessed on 15 December 2024) to construct the syntactic tree for each sequence. Through this method, we can obtain the corresponding density matrix, with element values representing the types of dependency relations. For the text encoder, we adopt the pre-trained BERT model [34] as the lexical encoder for computing sequential representations. The process is as follows:
W = e m b e d d i n g ( S )
W 0 , C L S = B E R T ( W )
where W R n × d w denotes the d w dimensional word embedding matrix of the sentence S, and C L S R d represents the representation of a special sequence used for global pooling, specifically the output of the BERT pooling layer.

3.2.2. Embedding Model

Complex-valued Embedding Module: The complex-valued embedding module is designed to construct the density matrix for sentence dependency relations, representing semantic dependency relations as pure states within a Hilbert space. This module embeds the dependency structure of sentences into a complex-valued Hilbert space through the form of complex tensors. In this module, we first map the structural information of the original dependency relations into real-part vectors u and imaginary-part vectors v through real-part embedding and imaginary-part embedding modules, respectively. Here, d represents the real-valued vector of the dependency relations. The formula is as follows:
u = E m b e d d i n g r ( d )
v = E m b e d d i n g i ( d )
Relying on the representation of complex numbers, these two vectors jointly constitute the complex-valued embedding vector w, with the calculation formula as follows:
w = u + i v
where w represents the complex-valued embedding vector, u and v are the real and imaginary parts of the embedding representation of the dependency relation information, respectively, and i is the imaginary unit ( i = 1 ) . This complex-valued density matrix not only provides a higher-dimensional embedding representation in terms of structural information but also lays the foundation for complex-valued operations in subsequent modules. Due to the fact that both coherence embedding and quantum cross-attention mechanisms are based on computations in the complex domain, the complex-valued density matrix can naturally support these operations in complex space, enabling deep interactions between dependency relations and quantum state encodings. The coupling between modules designed in this way can fully exploit the characteristics of complex-valued encodings, thereby enhancing the expressive power of model in modeling complex dependency relations and semantic information.
Quantum Embedding Module: In quantum mechanics, coherence refers to the existence of a definite phase relationship between quantum states, allowing the system to be in a superposition state. Coherence is one of the important characteristics distinguishing quantum systems from classical systems, enabling quantum states to be represented in complex-valued spaces through amplitude and phase. Therefore, we choose to simulate the fusion of multiple meanings between words by constructing superpositions among them within the sequence. In this paper, to satisfy the criteria for quantifying coherence, we fix a set of orthonormal bases e 1 , e 2 , , e n in an n-dimensional Hilbert space H n , which satisfy the orthonormality condition e i e j = δ i j , i , j { 1 , 2 , , n } , where δ i j is the Kronecker delta function. The orthonormal bases we choose are the computational bases, denoted specifically as e i = [ 0 , , 1 , , 0 ] T , where the i-th position is 1 and the others are 0. We utilize the output h i = h i 1 , h i 2 , , h i n from the last layer of BERT, where n denotes the embedding length and each h i represents the vector of a single word in the sequence. We map it onto the aforementioned fixed orthonormal basis to construct the quantum state h i = h i 1 [ 1 , 0 , , 0 ] + h i 2 [ 0 , 1 , , 0 ] + + h i n [ 0 , 0 , , 1 ] , where h i j represents the component of h i in the j-th dimension, and signifies the projection coefficient of h i onto the corresponding orthonormal basis. Taking the first word in the sequence as an example, it can be represented using Dirac notation as ϕ 1 = h 11 e 1 + h 12 e 2 + + h 1 n e n , where | e j denotes the elements of the orthonormal basis, and h 1 j are the corresponding superposition coefficients satisfying the normalization condition j = 1 n h 1 j 2 = 1 . Up to this point, we have been able to quantify the different degrees of semantic relatedness of different phrases or individual words in a sequence using relative entropy of coherence. Based on the calculation of coherence introduced in Section 3.1.3 on quantum coherence, we take the following sentence as an example: “great food but the service was dreadful!”. The entropy values of the phrases great food and service was dreadful are 3.0041 and 3.4688, which are significantly higher than those of other phrases. We constructed a mapping for the attention mask in the same manner and utilized it to create a complex-valued representation. Quantum states can be represented through density matrices, which describe the characteristics of the quantum state of the system. To convert the quantum state of each word into a density matrix, we perform a complex outer product, realizing the density matrix representation of the quantum state | ϕ i , which provides a probabilistic representation. The formula is as follows:
ρ i = | ϕ i ϕ i |
To further model the coherence and superposition properties between quantum states, we mix multiple quantum states through weighted combination and density matrix calculations, enabling information fusion among different words in the sequence and establishing indirect associations between words. This can be expressed in the following form:
ρ m i x e d = i = 1 N w i · ρ i = i = 1 N w i · | ϕ i ϕ i |
where w i represents the weight assigned to the state of the i-th word, satisfying i = 1 n w i = 1 , and is computed from the hidden states O of BERT through a softmax function, i.e., w i = s o f t m a x ( O ) . ρ i denotes the density matrix of the i-th word, and | ϕ i represents the quantum state of the i-th word in the sentence. This global weighted summation indirectly simulates the interrelationships within the system, akin to the idea in quantum systems where the relationships between different particles are expressed through the superposition of multi-particle states. The quantum state | ϕ i ϕ i | of each word influences the overall state ρ m i x e d of the sequence through the weights w i . These weights perform a linear transformation on the density matrix through matrix multiplication, analogous to the operation of quantum gates on quantum states. Compared to the quantum circuits required for entanglement embedding, this approach significantly reduces computational complexity.
The mixing of quantum states reflects the contributions of various states to the overall features, thereby achieving a coherent representation among words. It determines how adjacent words interact or superpose with one another, shaping the relationships between words within the module and, to some extent, influencing their relative positions or degrees of association in the quantum state space. Ultimately, the superposition state is mapped back to classical information through projection measurement. In other words, the constructed density matrix is subjected to projection measurement to obtain a classical probability distribution, which is then utilized for subsequent model fusion and decision-making. Specifically, we use the constructed complex-valued measurement operator P to operate on the input density matrix ρ to compute the probability distribution or outcome of the measurement. The measurement operator P consists of two parts, the real part P r e a l and the imaginary part P i m a g , and is formulated as follows:
P r e a l = K r e a l · K r e a l T + K i m a g · K i m a g T
P i m a g = K i m a g · K r e a l T + K r e a l · K i m a g T
P = P r e a l + i · P i m a g
where K r e a l is the real part kernel, initially a unit matrix I, and K i m a g is the imaginary part kernel, initially a zero matrix. They are both complex-valued projection kernels used to define a subspace projection in a complex-valued space. The formula for the projection measure can be written as follows:
ρ o u t = t r ( P ρ m i x e d )
where t r is the computational trace operation. To integrate the quantum representation with the classical output of BERT, we concatenate the original hidden states O, the density matrix representation ρ mixed , and the quantum measurement results ρ out together to form the final feature representation:
F = O , ρ mixed , ρ out
This fused representation not only preserves BERT’s robust context modeling capabilities but also enhances the global expressive power and inherent structural information of the feature representation through quantum coherence.

3.2.3. Dependency Fusion Module

We use the quantum cross-attention mechanism to fuse the utterance sequence and dependency features, and the main process is shown in Figure 2. In the figure, ρ α is the sequence feature representation after quantum embedding, and ρ β is the dependency feature after the complex-valued embedding modules are both fed into the three Q-linear layers, outputting K, V and Q, respectively, to achieve feature fusion through the cross-attention mechanism. Q-linear is analogous to the quantum unitary evolution of the linear layer to match the operations of the density matrix, as exemplified by ρ α in Figure 2:
Q = U ρ α U
where U is a unitary matrix, and Q is a density matrix, while K and V of this set are obtained from ρ β by varying the unitary matrices in the same way. For pure state vectors, the attention score can be computed using the inner product, but since the input superposition state representation is in terms of density matrices, we introduced a linear transformation to compute the trace of the product of two density matrices according to Busemeyer [30]:
Figure 2. The structure of QCA.
t r ( ρ α ρ β ) = t r i , j p i p j | ψ α , i ψ α , i | ψ β , j ψ β , j | = t r i , j p i p j ψ α , i | ψ β , j | ψ α , i ψ β , j | = i , j p i p j ψ α , i | ψ β , j 2 .
Equation (19) proves that t r ( ρ α ρ β ) is a weighted sum of inner products of pure states. In fact, this is a generalization of the inner product from vectors to density matrices, called the trace inner product [35,36]. Therefore, we compute the attention score between K and Q via the trace inner product, which is represented as
S = t r ( K β Q α )
μ = S o f t m a x ( S )
ρ α β = μ V
where ρ α β denotes the result of the operation with ρ α as Q and ρ β as K. This is a one-way computation in the cross-attention mechanism. After that, the dependency features are used as Q and the utterance sequence features are used as K and V. The same operation is performed to obtain ρ β α . After performing the same operation, the results from both directions are combined to yield new features, which can be mathematically represented as
ρ γ = ρ α β + ρ β α = S o f t m a x ( t r ( K β Q α ) V β ) + S o f t m a x ( t r ( K α Q β ) V α )
where Q i , K i , V i denote the different feature sources as Q, K, V, ρ γ is the final density matrix.

3.2.4. Sentiment Classification

We use the output of QCA as the final word representation, and average pooling over all word representations from the same aspect to extract the aspect representation, formulated as
r = j = 1 k ρ j γ k
where r R d is an aspect representation, j is a word index in a word sequence, and k is a k-word aspect. Then, r is spliced with the semantic information CLS of the entire input sequence and fed into a classifier consisting of a linear function and softmax to produce a probability distribution about the polarity decision. Finally, the parameters are optimized based on cross-entropy loss.

3.3. Datasets

We conducted experiments on three publicly available ABSA benchmark datasets: Restaurant and Laptop from SemEval 2014 [37], which consist of reviews related to restaurants and laptops, respectively; and the Twitter dataset, which was constructed by Dong et al. [38], consisting of Twitter posts. Table 1 shows the statistics of the datasets.
Table 1. The dataset in our experiment. #Pos, #Neg, and #Neu denote the number of instances with positive, negative, and neutral moods, respectively.

4. Results

This section describes the comparison experiments, ablation experiments, and visualization experiments that we performed. The model was trained using the Adam optimizer with an initial learning rate of 0.001 and Xavier uniform initialization. The optimal hyperparameter settings were selected based on performance across multiple datasets during development. The number of epochs was set to 15, with a batch size of 16. For dropout regularization, we used 0.5 for input and feedforward layers, and 0.2 for attention modules. The BERT encoder was initialized with bert-base-uncased and fine-tuned with a learning rate of 2 × 10−5 and a dropout of 0.5, using a weight decay of 0.01. The source code is publicly available at Github (https://github.com/Drake8023/QEDFM, accessed on 15 May 2025).

4.1. Baselines

We compare QEDFM to a state-of-the-art ABSA baseline, where both use a pre-trained model (BERT) approach:
BERT [34] is an ordinary BERT model that makes predictions by feeding sentence–aspect pairs and using a representation of [CLS].
BERT-PT [11] post-trains BERT using an external domain-specific corpus to improve the performance of the model.
R-GAT-BERT [13] is R-GAT, which uses pre-trained BERT instead of BiLSTM as an encoder.
DGEDT-BERT [15] is the DGEDT model, which uses a pre-trained BERT instead of BiLSTM as an encoder.
BERT4GCN [17] enhances GCN using the output of the BERT middle layer and positional formation between words.
TGCN-BERT [16] distinguishes relation types by attention and learns features from multiple GCN layers using a collection of attention layers.
SSEGCN-BERT [18] acquires semantic information through attention and equips it with syntactic information with minimum tree distance.
AG-VSR-BERT [19] injects semantic information by correcting incorrect dependency connections through Attention-Assisted Graph Representation (A2GR) and Variable Sentence Representation (VSR).
C-BERT [25] constructs complex-valued word representations by treating real and imaginary contextual representations as linear functions of the dropout of the BERT output vector.
MHA+RGAT+BERT-ATE-APC [20] is a multitask learning model that integrates BERT and RGAT models for APC and ATE tasks, while associating dependency sequences with aspect extraction via MHA.
DCASAM [21] integrates BERT while using Context Dynamic Masking (CDM) and Talking Head Attention (THA) mechanisms to extract global and local contextual features, and finally captures structural information using a densely connected GCN.

4.2. Main Results

Table 2 presents the results of the experiments conducted on the Restaurant, Laptop, and Twitter datasets. The findings indicate that QEDFM consistently yields the best results in most cases.
Table 2. Comparison of models with various datasets. The best results are in bold, and the second-best results are underlined. indicates that the baseline was rerun according to the open-source code and passed the significance statistic.
In the Restaurant dataset, our QEDFM model outperforms the other ABSA baseline models in both accuracy (Acc.) and F1 score (F1) by margins of 0.53% to 0.71%. Although the Laptop and Twitter datasets do not achieve the highest results in terms of accuracy and F1 score, they remain competitive and reach the next best performance level. Several factors may contribute to this outcome, including the effectiveness of both DGEDT and TGCN models. These models leverage GNN to enhance aspectual representations by learning syntactic dependency trees or induced trees, which capture additional structural patterns for syntactic analysis, thereby improving performance. Additionally, the presence of certain special symbols in the comments within the dataset may introduce some noise, partially obscuring the semantic information.
We also tested our model for statistical significance by bootstrap resampling over 1000 iterations against the baseline model labeled by the dagger symbol in Table 2. For example, when comparing our model with SSEGCN, the 95% confidence interval for the difference in accuracy is [0.0861, 0.1297], with an average difference of 0.1079, indicating a statistically significant improvement. Similar results were observed on the Laptop and Twitter datasets. Overall, our approach of incorporating quantum embedding to represent the fusion of semantic features with dependency information demonstrates comparable superiority to classical modeling approaches for the current ABSA task.

4.3. Ablation Experiment

We performed ablation experiments to demonstrate the effectiveness of our quantum cross-attention mechanism with quantum coding. We made three sets of comparisons. In the QE-QA configuration, we focused attention solely on the utterance sequence features as Q and the dependency features as K and V, using the resulting feature matrix as input for subsequent classification. In the QE configuration, we directly superimposed the feature matrix obtained after quantum embedding with the dependencies derived from complex-valued embedding to ensure that the dependencies were preserved in comparison to the other methods. To demonstrate the effectiveness of our quantum coding, we also added a set without any (w/o any), using solely the output of the pre-trained model. The results are presented in Table 3. Initially, the comparison between QE and w/o any shows that our quantum embedding approach yields a significant improvement for the pre-trained model. The quantum self-attention mechanism, when applied on top of this, provides a slight enhancement in accuracy and F1 score due to the incorporation of dependency information. Furthermore, through the quantum cross-attention mechanism, the resultant accuracy is 0.54% to 0.95% higher compared to all other methods, while the F1 score improves by 0.09% to 1.57%, significantly enhancing the fusion effect of the dependent information. Overall, our quantum embedding approach alone considerably improves the results of the pre-trained model, and when combined with the attention mechanism, it effectively integrates the dependent information, highlighting the superiority of our quantum embedding and quantum cross-attention mechanism.
Table 3. Results of ablation experiments on the Restaurant dataset. The best results are in bold.

4.4. Case Study

We present several test cases in Table 4 to demonstrate the ability of QEDFM to differentiate between various aspects. These reviews were randomly selected from different datasets. The results indicate that QEDFM is more effective at determining the sentiment polarity of one or more aspects compared to the baselines. This improvement is attributed to the deep fusion of dependency features facilitated by the quantum cross-attention mechanism, which highlights the benefits of quantum-inspired modeling for syntactic and semantic context integration.
Table 4. A number of test cases were extracted from both datasets, where P stands for positive, N for negative, and O for neutral sentiment polarity. These examples illustrate predictions on a three-class aspect-based sentiment classification task.

4.5. Post Hoc Interpretability

In quantum information theory, the quantum relative entropy S serves as a measure of the distinguishability between two quantum states. This paper provides a quantitative representation of the relationships between different words following quantum embedding by quantifying coherence based on relative entropy. The relative entropy of coherence is introduced in the quantum coherence Section 3.1.3, and its calculation formula can be expressed as follows:
C r ( ϱ ) = min δ I S ( ϱ , δ ) = S ϱ diag S ( ϱ ) = x = 1 K λ x log ( λ x ) + y = 1 K λ y log ( λ y )
where λ x represents the states of the main diagonal elements only, λ y represents all states, and K is the minimum dimension of the subsystem.
Table 5 shows the most and least selected lexical phrase after we sorted the test data in both Laptop and Restaurant datasets by calculating their inter-word relative entropy of coherence after quantum embedding. Since the quantized concept establishes that the maximum coherence limit is log ( d ) , and given that the dimension of our density matrix is 100, the values falls within a range of 4. We consider values less than 2.5 to indicate low coherence, while values greater than 2.5 suggest high coherence. It can be observed that most word pairs exhibiting high coherence are fixed collocations or combinations of words that represent specific objects. For example, ask for and after all are abstract fixed collocations with high degrees of association, making them difficult to separate independently. Words like the thunder represent specific objects, and there is a clear modifier–referent relationship between the and thunder, which may indicate strong semantic coupling between them. Therefore, the density matrices of these phrases may exhibit more complex interactions, leading to higher coherence. These phrases play more important semantic roles in the context and are information-intensive, necessitating a more precise capture of their semantic associations to enhance the understanding capabilities of the model. Phrases with lower relative entropy, on the other hand, mostly serve as grammatical connecting structures, for example, to be or there was lack strong semantic connections and exhibit weaker dependencies between words. Words like is and a function more as grammatical auxiliaries. Therefore, they manifest as simple, basic syntactic functions or state descriptions with less information, making them more suitable as grammatical auxiliary structures with relatively straightforward roles. The model can appropriately simplify its processing of these phrases, resulting in correspondingly lower coherence.
Table 5. Semantic relative entropy of coherence in the two datasets.

4.6. Visualization

We took a closer step to visualize the effect of model encoding. Figure 3a,b show the feature matrices after we extracted the quantum embedding in the model versus after only the pre-trained model, and visualized the inter-word correlations by plotting them using cosine similarity on the features of the text sequences after quantum encoding versus after only the pre-trained model, respectively. The corresponding sentence is “great food but the service was dreadful!” In terms of human language, the key to the expression of sentiment polarity in this sentence lies in the meaning after but, where the most important sentiment information of service is dreadful, which indicates that the subject of this sentence has a negative sentiment tendency, and we can see that after quantum embedding, our model learns a richer semantic information of the whole sentence, focusing on the difference between the words service and dreadful, and ignores great food, which interferes with the sentiment expression. On the other hand, the inter-word relationship without quantum embedding still has a lot of relevance in the first half of the sentence.
Figure 3. Comparison of inter-word cosine similarity with and without quantum embedding.
To further illustrate the advantages of quantum embedding in semantic representation, we constructed a kernel density plot of word pair distances to compare the performance of quantum embedding with that of a pre-trained model (BERT) in terms of the density matrix after processing sequences (see Figure 4). We observe that the word pair distances in quantum embedding are concentrated in shorter distance intervals, with density values in this region significantly higher than those of the pre-trained model. This indicates that superposition state encoding can more effectively capture semantically related word pairs, thereby enhancing semantic focus. In contrast, the word pair distance distribution produced by the pre-trained model is relatively uniform, suggesting that it possesses weaker semantic distinction capabilities for word pairs and is unable to effectively differentiate between semantically related and unrelated word pairs. The experimental results demonstrate that quantum embedding not only outperforms the pre-trained model in terms of density concentration but also exhibits stronger focusing and discrimination capabilities for semantically related word pairs, which holds significant application value in semantic understanding tasks.
Figure 4. Kernel density map.

5. Conclusions and Discussion

In this paper, we analyze the advantages of quantum embedding in enhancing semantic understanding, as well as the challenges associated with model interpretability. To address these challenges, we propose a quantum embedding-based approach and present the Dependency Fusion Method for ABSA tasks. By introducing superposition states, we model the inherent uncertainty in human sentiment expressions, enabling the effective integration of sentence sequences that exhibit conflicting sentiment polarities. Furthermore, we propose a novel quantum cross-attention mechanism that leverages dependency relations as a guiding framework, thereby enabling the model to more accurately discern sentence sentiment polarity. Comprehensive experiments conducted on publicly available benchmark datasets validate the efficacy of our proposed method. In future research, we intend to integrate our cross-attention mechanism with graph convolution techniques to enhance the structured information fusion of dependency relations, aiming to achieve more sophisticated feature fusion within superposition state encoding.

Author Contributions

Conceptualization, C.X. and L.S.; methodology, C.X. and L.S.; software, C.X., J.T. and Y.W.; validation, C.X., Y.W. and Q.G.; formal analysis, C.X. and X.W.; investigation, C.X. and L.S.; resources, C.X. and Q.G.; data curation, C.X. and J.T.; writing—original draft preparation, C.X.; writing—review and editing, C.X., L.S., X.W. and Q.G.; visualization, C.X. and Y.W.; supervision, L.S.; project administration, C.X., L.S. and Q.G.; funding acquisition, L.S., Q.G. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Natural Science Foundation of China (Nos. 62072362, 12101479) and the Shaanxi Provincial Key Industry Innovation Chain Program (No. 2020ZDLGY07-05), Natural Science Basis Research Plan in Shaanxi Province of China (No. 2021JQ-660), Xi’an Major Scientific and Technological Achievements Transformation Industrialization Project (No. 23CGZH-CYH0008), and Shaanxi Provincial Science and Technology Department Project (No. 2024JC-YBMS-531).

Data Availability Statement

We have used two datasets including the following: SemEval 2014 [37] and Twitter [38].

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Tang, D.; Qin, B.; Liu, T. Aspect Level Sentiment Classification with Deep Memory Network. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 214–224. [Google Scholar]
  2. Zhang, C.; Li, Q.; Song, D. Aspect-based Sentiment Classification with Aspect-specific Graph Convolutional Networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 4568–4578. [Google Scholar]
  3. Sun, K.; Zhang, R.; Mensah, S.; Mao, Y.; Liu, X. Aspect-level sentiment analysis via convolution over dependency tree. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 5679–5688. [Google Scholar]
  4. Li, Q.; Wang, B.; Melucci, M. CNM: An Interpretable Complex-valued Network for Matching. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Minneapolis, MN, USA, 2–7 June 2019; pp. 4139–4148. [Google Scholar]
  5. Chen, Y.; Pan, Y.; Dong, D. Quantum language model with entanglement embedding for question answering. IEEE Trans. Cybern. 2021, 53, 3467–3478. [Google Scholar] [CrossRef] [PubMed]
  6. Yu, Y.; Qiu, D.; Yan, R. A quantum entanglement-based approach for computing sentence similarity. IEEE Access 2020, 8, 174265–174278. [Google Scholar] [CrossRef]
  7. Zhou, J.; Huang, J.X.; Chen, Q.; Hu, Q.V.; Wang, T.; He, L. Deep learning for aspect-level sentiment classification: Survey, vision, and challenges. IEEE Access 2019, 7, 78454–78483. [Google Scholar] [CrossRef]
  8. Liu, H.; Chatterjee, I.; Zhou, M.; Lu, X.S.; Abusorrah, A. Aspect-based sentiment analysis: A survey of deep learning methods. IEEE Trans. Comput. Soc. Syst. 2020, 7, 1358–1375. [Google Scholar] [CrossRef]
  9. Vo, D.T.; Zhang, Y. Target-dependent twitter sentiment classification with rich automatic features. In Proceedings of the 24th International Conference on Artificial Intelligence, Buenos Aires, Spain, 25–31 July 2015; pp. 1347–1353. [Google Scholar]
  10. Wang, Y.; Huang, M.; Zhu, X.; Zhao, L. Attention-based LSTM for aspect-level sentiment classification. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Guangzhou, China, 18–20 June 2016; pp. 606–615. [Google Scholar]
  11. Xu, H.; Liu, B.; Shu, L.; Yu, P. BERT Post-Training for Review Reading Comprehension and Aspect-based Sentiment Analysis. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1. [Google Scholar]
  12. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  13. Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational Graph Attention Network for Aspect-based Sentiment Analysis. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3229–3238. [Google Scholar]
  14. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. arXiv 2018, arXiv:1710.10903. [Google Scholar]
  15. Tang, H.; Ji, D.; Li, C.; Zhou, Q. Dependency graph enhanced dual-transformer structure for aspect-based sentiment classification. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 6578–6588. [Google Scholar]
  16. Tian, Y.; Chen, G.; Song, Y. Aspect-based sentiment analysis with type-aware graph convolutional networks and layer ensemble. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021; pp. 2910–2922. [Google Scholar]
  17. Xiao, Z.; Wu, J.; Chen, Q.; Deng, C. BERT4GCN: Using BERT Intermediate Layers to Augment GCN for Aspect-based Sentiment Classification. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 9193–9200. [Google Scholar]
  18. Zhang, Z.; Zhou, Z.; Wang, Y. SSEGCN: Syntactic and semantic enhanced graph convolutional network for aspect-based sentiment analysis. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 4916–4925. [Google Scholar]
  19. Feng, S.; Wang, B.; Yang, Z.; Ouyang, J. Aspect-based sentiment analysis with attention-assisted graph and variational sentence representation. Knowl.-Based Syst. 2022, 258, 109975. [Google Scholar] [CrossRef]
  20. Zhao, G.; Luo, Y.; Chen, Q.; Qian, X. Aspect-based sentiment analysis via multitask learning for online reviews. Knowl.-Based Syst. 2023, 264, 110326. [Google Scholar] [CrossRef]
  21. Jiang, X.; Ren, B.; Wu, Q.; Wang, W.; Li, H. DCASAM: Advancing aspect-based sentiment analysis through a deep context-aware sentiment analysis model. Complex Intell. Syst. 2024, 10, 7907–7926. [Google Scholar] [CrossRef]
  22. Sordoni, A.; Nie, J.Y.; Bengio, Y. Modeling term dependencies with quantum language models for IR. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 28 July–1 August 2013; pp. 653–662. [Google Scholar]
  23. Li, S.; Hou, Y. Quantum-inspired model based on convolutional neural network for sentiment analysis. In Proceedings of the 2021 4th International Conference on Artificial Intelligence and Big Data (ICAIBD), Chengdu, China, 28–31 May 2021; pp. 347–351. [Google Scholar]
  24. Gkoumas, D.; Li, Q.; Dehdashti, S.; Melucci, M.; Yu, Y.; Song, D. Quantum Cognitively Motivated Decision Fusion for Video Sentiment Analysis. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, AAAI 2021, Online, 2–9 February 2021; pp. 827–835. [Google Scholar]
  25. Zhao, Q.; Hou, C.; Xu, R. Quantum-Inspired Complex-Valued Language Models for Aspect-Based Sentiment Classification. Entropy 2022, 24, 621. [Google Scholar] [CrossRef] [PubMed]
  26. Zhao, X.; Wan, H.; Qi, K. QPEN: Quantum projection and quantum entanglement enhanced network for cross-lingual aspect-based sentiment analysis. In Proceedings of the Thirty-Eighth AAAI Conference on Artificial Intelligence and Thirty-Sixth Conference on Innovative Applications of Artificial Intelligence and Fourteenth Symposium on Educational Advances in Artificial Intelligence, Vancouver, BC, Canada, 20–27 February 2024; pp. 19670–19678. [Google Scholar]
  27. Khrennikov, A. Ubiquitous Quantum Structure; Springer: Berlin/Heidelberg, Germany, 2010. [Google Scholar]
  28. Alodjants, A.; Tsarev, D.; Avdyushina, A.; Khrennikov, A.Y.; Boukhanovsky, A. Quantum-inspired modeling of distributed intelligence systems with artificial intelligent agents self-organization. Sci. Rep. 2024, 14, 15438. [Google Scholar] [CrossRef] [PubMed]
  29. Nielsen, M.A.; Chuang, I.L. Quantum Computation and Quantum Information; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]
  30. Busemeyer, J. Quantum Models of Cognition and Decision; Cambridge University Press: Cambridge, UK, 2012. [Google Scholar]
  31. Baumgratz, T.; Cramer, M.; Plenio, M.B. Quantifying coherence. Phys. Rev. Lett. 2014, 113, 140401. [Google Scholar] [CrossRef] [PubMed]
  32. Fell, L.; Dehdashti, S.; Bruza, P.; Moreira, C. An Experimental Protocol to Derive and Validate a Quantum Model of Decision-Making. In Proceedings of the Annual Meeting of the Cognitive Science Society, Montreal, QC, Canada, 24–27 July 2019; Volume 41. [Google Scholar]
  33. Halmos, P.R. Finite-Dimensional Vector Spaces; Courier Dover Publications: Garden City, UK, 2017. [Google Scholar]
  34. Kenton, J.D.M.W.C.; Toutanova, L.K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the naacL-HLT, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, p. 2. [Google Scholar]
  35. Balkır, E. Using Density Matrices in a Compositional Distributional Model of Meaning. Master’s Thesis, University of Oxford, Oxford, UK, 2014. [Google Scholar]
  36. Zhang, P.; Niu, J.; Su, Z.; Wang, B.; Ma, L.; Song, D. End-to-end quantum-like language models with application to question answering. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, New Orleans, LA, USA, 2–7 February 2018; pp. 5666–5673. [Google Scholar]
  37. Pontiki, M.; Galanis, D.; Pavlopoulos, J.; Papageorgiou, H.; Androutsopoulos, I.; Manandhar, S. SemEval-2014 Task 4: Aspect Based Sentiment Analysis. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014); Nakov, P., Zesch, T., Eds.; Association for Computational Linguistics: Dublin, Ireland, 2014; pp. 27–35. [Google Scholar] [CrossRef]
  38. Dong, L.; Wei, F.; Tan, C.; Tang, D.; Zhou, M.; Xu, K. Adaptive recursive neural network for target-dependent twitter sentiment classification. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Baltimore, MD, USA, 13–15 June 2014; pp. 49–54. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.