Next Article in Journal
CIM-LP: A Credibility-Aware Incentive Mechanism Based on Long Short-Term Memory and Proximal Policy Optimization for Mobile Crowdsensing
Previous Article in Journal
A Fractional-Order State Estimation Method for Supercapacitor Energy Storage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

KAN-and-Attention Based Precoding for Massive MIMO ISAC Systems

1
School of Computer Science and Communication Engineering, Jiangsu University, Zhenjiang 212013, China
2
Wuhan Maritime Communication Research Institute, Wuhan 430079, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(16), 3232; https://doi.org/10.3390/electronics14163232
Submission received: 16 July 2025 / Revised: 8 August 2025 / Accepted: 11 August 2025 / Published: 14 August 2025
(This article belongs to the Section Microwave and Wireless Communications)

Abstract

Precoding technology is one of the core technologies that significantly impacts the performance of massive Multiple-Input Multiple-Output (MIMO) Integrated Sensing and Communication (ISAC) systems. Traditional precoding methods, due to their inherent limitations, struggle to adapt to complex channel conditions. Although more advanced neural network-based precoding schemes can accommodate complex channel environments, they suffer from high computational complexity. To address these issues, this paper proposes a KAN-and-Attention based ISAC Precoding (KAIP) scheme for massive MIMO ISAC systems. KAIP extracts channel interference features through multi-layer attention mechanisms and leverages the nonlinear fitting capability of the Kolmogorov–Arnold Network (KAN) to generate precoding matrices, significantly enhancing system performance. Simulation results demonstrate that compared with conventional precoding schemes, the proposed KAIP scheme exhibits significant performance enhancements, including a 70% increase in sum rate (SR) and a 96% decrease in computing time (CT) compared with fully connected neural network (FCNN) based precoding, and a 4% improvement in received power (RP) over the precoding based on convolutional neural network (CNN).

1. Introduction

Massive Multiple-Input Multiple-Output (MIMO) technology is one of the key enablers of 6G [1]. It can achieve multi-user parallel services on the same time-frequency resource by deploying hundreds of antennas at the base station (BS) [2]. Massive MIMO technology, with its excellent energy efficiency performance and spectrum utilization efficiency, is regarded as the core driving force for the development of next-generation communication systems. Meanwhile, based on the ultra-high spatial degree of freedom, massive MIMO technology is also a supporting technology for Integrated Sensing and Communication (ISAC) [3], which enables joint communications and wireless sensing and has received extensive research attention [4].
In the downlink, beamforming techniques or efficient precoding underpin the excellent functionality of massive MIMO. Related studies include traditional precoding methods [5,6,7,8,9,10] and neural network based precoding methods [11,12,13,14,15,16,17,18,19].
Matched Filter (MF) precoding is the simplest precoding scheme, where the precoding matrix is directly obtained by the conjugate transpose of the channel matrix, but its performance is limited [5]. Therefore, Zero Forcing (ZF) and Minimum Mean Square Error (MMSE) further take the inter-user interference into account, yet their capability still falls short of modern requirements due to inherent limitations [6,7]. Subsequently, more advanced precoding methods are devised. For example, a Gradient Descent (GD) based precoding scheme is proposed in [10], where it is implicitly used to solve the Lagrangian formulation of the precoding problem under power constraints. The authors of [8,9] employed Semidefinite Relaxation (SDR) in precoding design, which reformulates the nonconvex precoding design problems into tractable semidefinite programs (SDPs). However, the high computational complexity restricts the practical use of these approaches.
To further enhance the performance of the MIMO system, neural network-based precoding methods have been proposed. Liquid Neural Networks (LNN) is used in [11] to generate precoding matrices, which utilize gradient-based optimization to extract high-order channel information and incorporate a residual connection to ease training. Fully connected neural networks (FCNN) are employed in [12,13], which use fully connected layers to map estimated channels to precoding vectors, incorporating a scalar quantization layer to handle discrete phase constraints. The authors of [14,15,16] adopted deep reinforcement learning (DRL) by formulating the NP-hard precoding problem as a Markov decision process (MDP). DRL’s model-free nature proves particularly effective in multi-hop and multi-user scenarios, where conventional optimization struggles with non-convexity and high dimensionality. Meta-learning (ML) is leveraged in [17,18,19] by enabling the neural networks to learn optimization strategies across diverse scenarios, thereby improving the efficiency in solving the non-convex precoding problem. By feeding the gradients of precoding matrices into lightweight neural networks and employing nested optimization loops, the frameworks dynamically adjust parameters to navigate complex environments, ensuring consistent performance under varying channel conditions. All the above research demonstrates extraordinary precoding operations. However, the use of large-scale networks leads to a steep increase in computational complexity with a large number of antennas.
In the case of ISAC, where the communication users require high data rate services and the sensing users require accurate target detection and environment sensing capabilities, a multidimensional challenge is posed to the downlink precoding design. To address this issue, we propose a novel KAN and Attention-based ISAC Precoding (KAIP) scheme for massive MIMO ISAC systems. Based on the attention mechanism and Kolmogorov–Arnold Network (KAN) [20], we design a neural network to generate the precoding matrix for both the communication and the sensing users, with consideration of the received power (RP) for the sensing users as well as the sum rate (SR) obtained by the communication users. The detailed contributions of this work are summarized as follows:
  • The precoding problem is formulated for massive MIMO ISAC systems, which takes both the communication users and the sensing users into consideration.
  • We propose the novel KAIP scheme that employs KAN and the self-attention mechanism for precoding design in massive MIMO ISAC systems. The attention mechanism in KAIP deeply extracts interference features from the channel matrix to reduce multi-user interference, while the KAN network is responsible for directly generating the precoding matrix with low complexity.
  • In order to evaluate the performance of KAIP, extensive simulations are carried out. The proposed KAIP is compared with existing approaches such as ZF, MMSE, FCNN-based scheme, and CNN-based scheme. Numerical results show that the proposed KAIP scheme exhibits significant performance enhancements with a rather low complexity.
The rest of the paper is organized as follows: In Section 2, the system model and the problem formulation is provided. The proposed KAIP scheme is introduced in Section 3. The numerical results are presented in Section 4, and conclusions are drawn in Section 5.

2. System Model and Problem Formulation

Consider the downlink of a massive MIMO ISAC system. The BS is equipped with M antennas. There are N single-antenna users in the system, including N 1 communication users and N 2 sensing users. The distance between different users and the BS varies, and there are obstacles between them. An illustration of the considered scenario is given in Figure 1.
The signal received by the n-th user is denoted as y n . Let y = y 1 , , y n be the received signal vector at all the N users, which is expressed as (1).
y = HWx + n ,
where H C N × M is the channel matrix from the BS to the N users; W C M × N is the precoding matrix; x = x 1 , , x N T with x n CN ( 0 , 1 ) denotes the symbols for the n-th user, n C N is the noise vector, and n CN 0 , σ 2 I N .
Using (1), we can obtain the received signal vector of the communicating users y c C N 1 and that of the sensing user y s C N 2 , respectively, as (2).
y c y s = H c H s W c W s x c x s + n c n s ,
where H c C N 1 × M and H s C N 2 × M are the channel matrix for N 1 communicating users and N 2 sensing users, respectively; W c C M × N 1 and W s C M × N 2 are the precoding matrix for the communicating users and the sensing users, respectively; x c = x c 1 , x c N 1 T and x s = x s 1 , , x s N 2 T with x c i and x s j denote the symbols for the i-th communicating user and the j-th sensing user, respectively; n c C N 1 , n c CN 0 , σ 2 I N 1 and n s C N 2 , n s CN 0 , σ 2 I N 2 denote the noise matrix of the communicating users and the sensing users, respectively.
The received signal at the i-th communication user, i.e., y c i is given by (3).
y c i = x c i h c i H w c i + j i x c j h c i H w c j + k = 1 N 2 x s k h c i H w s k + n c i ,
where w c i and w s j are the i-th and the j-th column of W c and W s , respectively; n c i is the noise of the i-th communication user; j i x c j h c i H w c j represents the interference from other communication users and k = 1 N 2 x s k h c i H w s k represents the interference from all the sensing users to the communication user’s received signal, respectively. h c i H is the i-th row of H c , denoting the channel for the i-th communication user, which is modeled as (4).
h c i = L c i ( d c i ) h ˜ c i ,
where h ˜ c i denotes the small-scale fading coefficients and L c i ( d c i ) denotes the pathloss of the i-th communication user, which is at a distance of d c i from the BS. The pathloss is modeled as L c i ( d c i ) = c d c i α , where α is the path loss exponent, which usually takes values ranging from 2.0 to 6.0 [21]; c is the pathloss at a reference distance and can be treated as a constant.
For the i-th communication user, the signal to interference plus noise ratio (SINR) is given by (5).
SINR i = h c i H w c i 2 j i h c i H w c j 2 + k = 1 N 2 h c i H w s k 2 + σ 2 .
The achievable rate can be expressed as (6).
R i = log 2 ( 1 + SINR i ) .
Then the SR of the MIMO system can be obtained as (7).
R = i = 1 N 1 R i .
The massive MIMO ISAC system also needs to sense the position of the sensing users. In this work, we take the RP at the sensing user as a metric for evaluating the system perception performance [22]. The RP for the j-th sensing user P s j is denoted as (8).
P s j = h s j H w s j 2 2 ,
where h s j is modeled in the same way as the communication users. To ensure the sensing accuracy, the RP should be no less than a predefined threshold P ¯ . Therefore, we have (9).
P s j P ¯ , 1 j N 2 .
To this end, we can formulate the precoding problem for massive MIMO ISAC systems as
max W R s . t . C 1 : Tr ( W H W ) = 1 , C 2 : P s j P ¯ , 1 j N 2 .
This optimization problem is difficult to solve due to the non-convex objective and the constraint of C2. In the following section, we propose to use a machine learning-based scheme to solve (10) and introduce the novel KAIP scheme.

3. The Proposed KAN and Attention-Based ISAC Precoding Method

This section introduces the novel KAN and Attention-based ISAC Precoding, i.e., KAIP method. The proposed scheme is mainly composed of the attention layer and the KAN layer. The simple structure not only allows for low-complexity processing but also improves system behavior.

3.1. Network Model of KAIP

The structure of the KAIP network is shown in Figure 2, which consists of an input layer, an output layer, an attention layer, and two hidden KAN layers. To be specific, the input of the network is the channel matrix H , which is fed into the attention layer. The output of the attention layer is then input into the KAN layer, which outputs the precoding matrix. The attention layer is responsible for deeply extracting the interference between different users in the system, which facilitates the subsequent suppression of strongly interfering users. The KAN layers are responsible for directly generating the precoding matrix.
The data flow process in the attention layer is shown in Figure 3. There are three learnable parameter matrices in the attention layer, namely W q , W k and W v . Firstly, Q , K , V can be obtained by calculating Q = HW q , K = HW k and V = HW v . Secondly, the matrix A is obtained by A = QK T . To ensure gradient stability during training, each element of matrix A is divided by d K , where d K is the dimension of K . Thirdly, softmax normalization is computed for the matrix A on a row-by-row basis, and finally the final output D o u t is obtained by D o u t = softmax A d k V . The entire computation process of the attention layer can be summarized by (11).
D o u t = softmax A d k V .
Through the strong feature extraction capability of the attention mechanism, the interference between different users can be extracted from the channel matrix, which facilitates the subsequent assignment of lower gains and weights to the strongly interfering users, thus reducing the interference between multiple users and improving the generalization capability of the algorithm.
Existing neural network-based precoding schemes face high computational complexity. To solve this problem, we propose to introduce KAN, which is inspired by the Kolmogorov–Arnold representation theorem [20]. KAN is different from traditional MLPs in its unique structural design, which achieves effective modeling of complex functions by placing the activation functions on edges instead of nodes. By carefully designing the nonlinear activation function, KAN is able to adapt to various types of complex channel environments, thus improving the generalization ability of the algorithm as well as reducing the number of network parameters.
The following briefly describes the KAN layers and the computational procedure of the KAN layers. Each edge in the two KAN layers corresponds to an activation function, and the activation function we adopt is given by (12), where μ , P i and σ are learnable parameters, and B i , n ( x ) is the B-spline basis function.
ϕ x = 1 2 π e x μ 2 σ 2 + i = 0 n P i B i , n ( x ) .
The input matrix from the previous attention layer D o u t is fed into the first KAN layer. The computation from the first KAN layer to the second KAN layer is performed by the formula x 1 = Φ l D o u t . Φ l F n 1 × 1 is a vector consisting of activation functions at the first KAN layer as shown in (13), where F is the set of functions.
Φ l = ϕ 0 , 1 , 1 ( · ) ϕ 0 , 1 , 2 ( · ) ϕ 0 , 1 , n 1 ( · ) T ,
where n 1 represents the number of nodes in the first KAN layer; each element ϕ l , j , i ( · ) is the function corresponding to the edge connecting the i-th node of layer l to the j-th node of layer l + 1 . Finally, the precoding matrix W can be calculated by (14).
W = ϕ 1 , 1 , 1 ( · ) ϕ 1 , 1 , 2 ( · ) ϕ 1 , 1 , n 1 ( · ) ϕ 1 , 2 , 1 ( · ) ϕ 1 , 2 , 2 ( · ) ϕ 1 , 2 , n 1 ( · ) ϕ 1 , n 2 , 1 ( · ) ϕ 1 , n 2 , 2 ( · ) ϕ 1 , n 2 , n 1 ( · ) x 1 ,
where n 2 represents the number of nodes in the second layer and x 1 is the output of the first KAN layer.

3.2. Training Process

3.2.1. Design of Loss Function

The constraints are not easy to handle when neural networks are used to solve the optimization problem. Therefore, we introduce a penalty function.
The constraint C1: Tr ( W H W ) = 1 enforces total transmit power normalization at the BS in the massive MIMO system. Specifically, it ensures that the sum of squared magnitudes of all precoding matrix elements equal 1, thereby limiting the BS’s radiated power to a fixed budget. We use ψ 2 ( Tr ( W H W ) 1 ) = λ 2 ( Tr ( W H W ) 1 ) 2 to describe this constraint. The quadratic term smoothly penalizes deviations from unit power while maintaining convexity near the feasible region.
The constraint C2: P s j P ¯ , 1 j N 2 ensures minimum RP requirements for sensing users in the system. Physically, it guarantees that each sensing user receives sufficient signal power ( P s j ) to maintain reliable sensing accuracy, defined by a threshold P ¯ . We use ψ 1 ( P s j P ¯ ) = λ 1 log ( P s j P ¯ ) to describe this constraint. The logarithmic function imposes an infinite penalty as P s j approaches P ¯ from above, ensuring strict satisfaction of the minimum RP requirement while remaining differentiable.
In summary, the optimization problem is converted into (15).
max W R + ψ 1 ( P s j P ¯ ) + ψ 2 ( Tr ( W H W ) 1 ) ,
where ψ 1 ( x ) = λ 1 log ( x ) , where λ 1 0 is the penalty weight. When P s j approaches P ¯ , the value of the function tends to , which ensures the constraint C2. In (15), ψ 2 ( x ) = λ 2 x 2 , where λ 2 0 is the penalty weight.
Based on the above objective function, we can design the loss function as (16).
L = α R + β Var R i i = 1 N 1 + γ Var R i | H i = 1 N 1 + ψ 1 P s j P ¯ + ψ 2 Tr W H W 1 .
The loss function is divided into five parts. The first part α R is responsible for ensuring that the maximum SR is obtained, in which α is the weight coefficient. The second part β Var R i i = 1 N 1 is responsible for ensuring fairness among users in the multi-user scenario, which means the achievable rate obtained by each user is as close as possible, where β is the weight coefficient, Var represents the variance of the achievable rates for each communication user. The third part γ Var R i | H i = 1 N 1 is responsible for improving the adaptive ability of the model in a complex channel environment, where γ is the weight coefficient. By calculating the dispersion of the SR under the current complex channel conditions, the adaptive ability of the channel can be obtained. The fourth and the fifth parts are used to ensure the constraints in (10) are met. The weights were indeed determined empirically through extensive experimentation and tuning to balance the contributions of each term effectively.
In order to avoid the overfitting phenomenon during the training process, we also add L2 regularization to the learnable parameters θ of the activation function as
L L 2 = L + τ θ 2 2 ,
where τ represents the regularized intensity.

3.2.2. Training Method

The training method used in this paper is Adam algorithm [23]. Firstly, all the parameters of the KAIP network are initialized. Then we split the real and imaginary parts of each element of the channel matrix H to form the real matrix H as the input to KAIP. The output of KAIP is the real precoding matrix W , which is then transformed into the complex precoding matrix W . W is used to compute the loss L L 2 , and the gradient is calculated as θ L L 2 . Finally, the Adam optimizer is used to update θ as
θ * = θ + α · A d a m θ L L 2 , θ .
In order to further reduce the network size and decrease the computational complexity, we adopt pruning operations in the training process. For a node in the KAN layer, we design an afferent function and an efferent function, which are used to measure the importance of a node receiving/transmitting information from a node in its previous/subsequent layer.
For the i-th node in layer l, the afferent function I l , i is given by
I l , i i n = max k ( ϕ l 1 , i , k 1 ) ,
where ϕ l 1 , i , k denotes the activation function (edge) connecting the k-th node of the ( l 1 ) -th layer to the i-th node of the l-th layer. I l , i denotes the maximum value of the L1-norm of all edges (activation functions) connected to that node. In this way, it is possible to highlight the connections that have the greatest impact on the input of that node. Thus, the incoming function reflects the strength of the most dominant message received by the node from its previous layer.
For the i-th neuron in layer l, the efferent fraction I l , i is given by
I l , i o u t = max k ( ϕ l + 1 , i , k 1 ) ,
where ϕ l + 1 , i , k denotes the activation function (edge) connecting the k-th node of layer ( l + 1 ) to the i-th node of layer l. The efferent function takes the maximum value of the L1 norm of all activation functions from that node. It is able to highlight the connection that has the greatest impact on the output of the nodes in the subsequent layer.
The node is viewed as an important node and is retained if both afferent and efferent values of it are greater than a threshold ξ . Otherwise, the node will be removed from the network. In this way, the redundant nodes in the network are removed, and the network structure is simplified. In this way, it is possible to avoid the overfitting situation and enhance the generalization performance.

4. Numerical Results and Analysis

In order to evaluate the behavior of the proposed KAIP scheme, we carried out extensive numerical studies. We consider a scenario where the number of antennas at the BS is M = 256 and randomly generate a set of 10,000 channel matrices, including 5000 Rician fading channel samples and 5000 Rayleigh fading channel samples. The transmit power is normalized to 1 W, whilst the threshold power is set as P ¯ = 0.15 W. The path loss exponent α is set to 3.0. The weights in the loss function are set to λ = 1 ,   α = 0.6 ,   β = 0.2 ,   γ = 0.2 ,   η = 1 , and the training set and test set for the KAIP model training are divided in an 8:2 ratio. The parameters for training KAIP are shown in Table 1. The proposed KAIP is trained and tested using an Intel i7-12700H CPU (made by Hewlett-Packard in Beijing, China) with Python 3.11 and Pytorch 2.1.2.
In order to show the superiority of the proposed KAIP, the following schemes are used for comparison:
  • ZF: inter-user interference is eliminated by finding the pseudo-inverse of the channel matrix to generate the precoding matrix.
  • MMSE: the precoding matrix is generated by balancing inter-user interference and noise enhancement.
  • FCNN-based precoding scheme [24]: Two fully connected neural layers are adopted in FCNN. The parameters of FCNN are summarized in Table 2.
  • CNN precoding scheme [25]: Two convolutional layers are adopted in CNN. The parameters of CNN are summarized in Table 3.

4.1. Convergence Analysis

In Figure 4, we show the local best convergence curves for two of the most interesting experiments in terms of number of epochs needed to reach the convergence.
The convergence curves illustrate the training dynamics of the proposed KAIP scheme in a Rician fading channel with 16 communication users and 4 sensing users. The horizontal axis (Epoch) represents the number of training iterations, where each epoch corresponds to a full pass of the training dataset through the network. The vertical axis (Loss) quantifies the value of the composite loss function defined in Equation (16).
The first curve (black line, SNR = 20 dB) shows rapid convergence, with the loss stabilizing near −19.6 by epoch 280, accompanied by minimal oscillations. This behavior reflects KAIP’s ability to efficiently optimize precoding matrices under high-SNR conditions, where strong signal dominance reduces gradient noise during training. In contrast, the second curve (red line, SNR = 0 dB) converges more slowly, reaching −10.8 by epoch 400, with larger fluctuations due to heightened interference and channel uncertainty in low-SNR scenarios. The distinct convergence patterns underscore KAIP’s adaptability: higher SNR accelerates optimization by providing clearer gradient directions, while lower SNR introduces instability, necessitating more iterations to mitigate noise-induced perturbations in the loss landscape. The reduced oscillation amplitude at SNR = 20 dB further confirms the scheme’s stability in favorable channel conditions.

4.2. ISR

To quantitatively validate the attention mechanism’s interference suppression capability, we analyze the Interference Suppression Ratio (ISR) heatmap generated from a 128-user scenario, where ISR is defined as follows: ISR = 1 A m a x ( A ) . The larger the ISR, the stronger the interference suppression capability. The simulation results of ISR is shown in Figure 5.
The simulation results show that off-diagonal blocks exhibit graduated suppression levels (close to yellow, ratio = 0.4–0.8), reflecting strong interference suppression capability. However, strong diagonal dominance (close to purple, ratio ≈ 0.2) confirms effective self-signal preservation.

4.3. SR

The SR performance is tested with different signal-to-noise ratios (SNRs) in both Rician and Rayleigh fading channels, where the number of communication users is N 1 = 16 . The SNR is defined as 1 σ 2 . As shown in Figure 6 and Figure 7, whether in Rician or Rayleigh fading channel, the SR always increases with respect to SNR. The deep learning-based schemes (i.e., FCNN, CNN, and KAIP) significantly outperform the conventional ZF and MMSE. Moreover, it can be clearly seen that the proposed KAIP achieves the highest SR, with a gain of around 15% over CNN-based precoding and 70% over FCNN-based precoding. The outstanding performance of the proposed KAIP comes from the fact that KAN has a stronger ability for nonlinear fitting than FCNN and CNN. Therefore, KAIP is more adapted to different channel environments.
The SR performance under Rician and Rayleigh fading channels for different numbers of communicating users is given in Figure 8 and Figure 9, respectively, where the SNR is set to 10 dB. From the figures, it can be seen that the SR increases with the number of users in both fading cases. Among all the curves, the proposed KAIP obtains the highest SR, with a significant gain over its counterparts.

4.4. Analysis of α , β and γ

The experimental results presented in Figure 10, Figure 11 and Figure 12 provide a comprehensive analysis of how the key parameters α , β and γ influence overall system performance across their respective operational ranges of 0.1–0.8, 0.1–0.5 and 0.1–0.5. The spectral efficiency (SR) demonstrates characteristic saturating growth behavior as parameter α increases, where initial rapid improvements in data rates gradually approach an asymptotic limit at higher α values, suggesting the existence of fundamental capacity constraints within the system architecture. This enhancement in spectral efficiency, however, introduces significant performance trade-offs that manifest through two distinct mechanisms. First, the system experiences progressively higher rate variance among users as α increases, indicating a substantial degradation in fairness metrics where certain users achieve disproportionately higher throughput compared to others in the network. Second, the inverse relationship observed between α and the channel adaptability parameter γ reveals that the improved spectral efficiency comes at the expense of reduced system agility in responding to dynamic channel conditions, potentially compromising performance in time-varying wireless environments. Similarly, the inverse correlation between α and β further confirms that gains in spectral efficiency directly impact fairness considerations, establishing a fundamental performance trade-off that system designers must carefully balance when optimizing these parameters for specific operational requirements and quality of service objectives. The comprehensive analysis of these interrelated effects provides valuable insights for developing adaptive parameter tuning strategies that can maintain optimal system performance across diverse network conditions and user requirements.

4.5. RP

The RP of four sensing users is tested at different distances from the BS (30 m and 40 m, respectively) in both Rician and Rayleigh fading channels. As shown in Figure 13, Figure 14, Figure 15 and Figure 16, the deep learning based schemes (i.e., FCNN, CNN and KAIP) significantly outperform the conventional ZF and MMSE. It can be seen that the proposed KAIP achieves the highest RP, with a gain of around 4% over CNN-based precoding and 11% over FCNN-based precoding. The RP decreases as the distance between the sensing user and the BS increases. The exceptional performance of the proposed KAIP is attributed to the fact that the neurons of KAN are composed of nonlinear functions, and its nonlinear fitting ability is much stronger than FCNN and CNN. Therefore, the KAIP-generated precoding matrix is more adapted to the different channel conditions at the same distance.

4.6. Complexity

The computation time (CT) of all the algorithms is given in Figure 17 with regard to different number of users. As shown in Figure 17, the CT of the deep learning based schemes (i.e., FCNN, CNN and KAIP) are significantly higher than the conventional ZF and MMSE. However, KAIP outperforms the FCNN and CNN based schemes, with a decrease in CT of around 90% and 96% compared with the CNN and FCNN based precoding, respectively. The proposed KAIP owns a superior behavior in that KAN layer in KAIP can complete the data fitting by using less hidden layers and fewer neurons. Therefore, the computational complexity of KAIP is much lower than the other deep learning based schemes.
Furthermore, Table 4 provides a theoretical analysis of the computational complexity between KAIP and other baselines. The computational complexity of ZF is primarily determined by the matrix inversion and multiplication operations [26], represented as O ( N 3 + M N 2 ) . MMSE shares a similar complexity of O ( N 3 + M N 2 ) due to its reliance on regularized matrix inversion [26]. FCNN involves dense matrix multiplications across layers [27], leading to a complexity of O ( N ( M k 1 + k 1 k 2 + k 2 M ) ) . k 1 is the number of nodes in the first FCNN layer. k 2 is the number of nodes in the second FCNN layer. The computational complexity of CNN is determined by its convolution operations [28], represented as O ( M N ( s 1 2 + s 2 2 ) + N M 2 ) , where s 1 represents the first kernel size and s 2 represents the second kernel size. KAIP combines KAN and attention mechanisms [29], leading to a complexity of O ( N 2 d k + N M d k + d k n 1 + n 2 M N ) , where n 1 represents the number of neurons in hidden layer 1 after pruning, and n 2 represents the number of neurons in hidden layer 2 after pruning.

4.7. Ablation Experiments

The ablation experiments were conducted under 256 BS antennas serving 16 communication users and 4 sensing users, evaluating three scheme variants (KAIP, No Attention, No KAN) across key metrics (RP, SR, CT) with parameters including SNR (14 dB, 16 dB, 18 dB, 20 dB), user distances (30 m, 40 m and 50 m), and system scale (N = 10 and 100 users), using mixed Rician/Rayleigh fading channels to validate the individual contributions of attention mechanisms and KAN layers. The ablation experiment results are shown in Table 5 and Table 6.
For RP, removing the attention mechanism reduces RP by 10–16% across distances (30–50 m), demonstrating its critical role in interference suppression and power allocation for sensing users. The degradation worsens with distance (e.g., −16% at 50 m), highlighting attention’s adaptability to pathloss. Without KAN, RP drops by 3–7%, showing KAN’s ability to fine-tune precoding weights for optimal power delivery, especially in far-field scenarios.
For SR, the absence of attention causes a severe 22–23% SR loss at all SNRs, proving its necessity for multi-user interference management and channel feature extraction. The absence of KAN layers reduces SR by 11–12%, as KAN’s nonlinear fitting capability better adapts to complex channel conditions.

4.8. SR-Weighted (SW)

Finally, from a cost-benefit trade-off perspective, we define a new SW precoding metric inspired by [30] under a Rayleigh fading channel. SW is defined as SW = SR × RP CT . The proposed precoding metric SW quantifies the system efficiency by measuring the joint communication-sensing performance per unit of computational cost. The results are depicted in Figure 18.
The experimental evaluation reveals that KAIP achieves an optimal balance between performance gains and computational efficiency from a cost-benefit standpoint, establishing its superiority over conventional approaches. Compared with traditional ZF and MMSE methods, KAIP demonstrates substantial performance improvements with 62–137% higher system welfare (SW), while maintaining a consistent 14–23% advantage over deep learning baselines, including FCNN and CNN architectures. These significant gains are attributed to KAIP’s innovative synergistic design that effectively combines complementary technologies. The integrated attention mechanism contributes a notable 23% enhancement in spectral efficiency (SR) through its sophisticated interference suppression capabilities, while the KAN layers provide a 12% improvement in resource partitioning (RP) with remarkable computational efficiency—requiring only an 8% incremental processing overhead. Ablation studies further validate the architectural choices, confirming that each component contributes meaningfully to the overall system performance without introducing excessive complexity. This carefully engineered balance between performance enhancement and computational cost positions KAIP as a practical solution for real-world deployment scenarios where both efficiency and effectiveness are critical considerations. The results demonstrate how intelligent architectural design can achieve substantial quality-of-service improvements while maintaining reasonable implementation costs, addressing a key challenge in modern communication system optimization.

5. Conclusions

In this work, we have proposed a novel precoding scheme, i.e., KAIP, to improve the performance of massive MIMO ISAC systems in both Rayleigh and Rician fading scenarios. KAIP’s attention layer has shown to be able to suppress the inter-user interference, whilst KAN’s spline-based nonlinear activation can adapt to dynamic channel conditions with low complexity. The proposed KAIP based precoding scheme outperforms ZF, MMSE, FCNN and CNN based precoding schemes, achieving significant improvement in both SR and RP over existing schemes with a over 90% reduction in CT compared with other performance-harvesting deep learning based schemes. Furthermore, from a theoretical perspective, we provide an analysis of KAIP’s computational complexity, convergence, and the interference suppression capability of the attention layers, along with experimental verification.
However, this study does not consider scenarios where users are in high-speed motion or channels are time-varying, which imposes limitations on the applicability of KAIP. The adaption to high mobility scenarios is important and is remained for future work. As KAN is a key component of KAIP, we do not analyze its intrinsic performance (e.g., approximation guarantees). The rationality and specific performance of KAN have been thoroughly discussed in [31], and thus, they are not elaborated on in this paper.

Author Contributions

Methodology, H.W. and W.Z.; software, H.W.; validation, H.W. and W.Z.; resources, W.Z.; writing—original draft preparation, H.W.; writing—review and editing, W.Z. and Z.Z.; supervision, W.Z and Z.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Student Research Project of Jiangsu University (22A222).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to the reason is the data usage conditions and the conditions used.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MIMOMultiple-Input Multiple-Output
ISACIntegrated Sensing and Communication
KAIPKAN-and-Attention Based Precoding
KANKolmogorov–Arnold Network
SRSum Rate
RPReceived Power
CTComputing Time
ISRInterference Suppression Ratio
SWSR-Weighted
FCNNFully Connected Neural Network
CNNConvolutional Neural Network
MFMatched Filter
ZFZero Forcing
MMSEMinimum Mean Square Error
GDGradient Descent
SDRSemidefinite Relaxation
SDPsSemidefinite Programs
LNNLiquid Neural Network
DRLDeep Reinforcement Learning
MDPMarkov Decision Process
MLMeta-learning
BSBase Station

References

  1. Wang, C.X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the Road to 6G: Visions, Requirements, Key Technologies, and Testbeds. IEEE Commun. Surv. Tutor. 2023, 25, 905–974. [Google Scholar] [CrossRef]
  2. Lu, L.; Li, G.Y.; Swindlehurst, A.L.; Ashikhmin, A.; Zhang, R. An Overview of Massive MIMO: Benefits and Challenges. IEEE J. Sel. Top. Signal Process. 2014, 8, 742–758. [Google Scholar] [CrossRef]
  3. Luo, X.; Lin, Q.; Zhang, R.; Chen, H.H.; Wang, X.; Huang, M. ISAC—A Survey on Its Layered Architecture, Technologies, Standardizations, Prototypes and Testbeds. IEEE Commun. Surv. Tutor. 2025; to appear. [Google Scholar] [CrossRef]
  4. González-Prelcic, N.; Furkan Keskin, M.; Kaltiokallio, O.; Valkama, M.; Dardari, D.; Shen, X.; Shen, Y.; Bayraktar, M.; Wymeersch, H. The Integrated Sensing and Communication Revolution for 6G: Vision, Techniques, and Applications. Proc. IEEE 2024, 112, 676–723. [Google Scholar] [CrossRef]
  5. Feng, C.; Jing, Y.; Jin, S. Interference and Outage Probability Analysis for Massive MIMO Downlink with MF Precoding. IEEE Signal Process. Lett. 2016, 23, 366–370. [Google Scholar] [CrossRef]
  6. Saxena, A.K.; Fijalkow, I.; Swindlehurst, A.L. On one-bit quantized ZF precoding for the multiuser massive MIMO downlink. In Proceedings of the 2016 IEEE Sensor Array and Multichannel Signal Processing Workshop (SAM), Rio de Janeiro, Brazil, 10–13 July 2016; pp. 1–5. [Google Scholar]
  7. Palhares, V.M.T.; de Lamare, R.C.; Flores, A.R.; Landau, L.T.N. Iterative MMSE Precoding and Power Allocation in Cell-Free Massive MIMO Systems. In Proceedings of the 2021 IEEE Statistical Signal Processing Workshop (SSP), Rio de Janeiro, Brazil, 11–14 July 2021; pp. 181–185. [Google Scholar]
  8. Liao, B.; Xiong, X.; Quan, Z. Robust Beamforming Design for Dual-Function Radar-Communication System. IEEE Trans. Veh. Technol. 2023, 72, 7508–7516. [Google Scholar] [CrossRef]
  9. Wajid, I.; Pesavento, M.; Eldar, Y.C.; Ciochina, D. Robust Downlink Beamforming with Partial Channel State Information for Conventional and Cognitive Radio Networks. IEEE Trans. Signal Process. 2013, 61, 3656–3670. [Google Scholar] [CrossRef]
  10. Palhares, V.M.T.; Flores, A.R.; de Lamare, R.C. Robust MMSE Precoding and Power Allocation for Cell-Free Massive MIMO Systems. IEEE Trans. Veh. Technol. 2021, 70, 5115–5120. [Google Scholar] [CrossRef]
  11. Wang, X.; Zhu, F.; Huang, C.; Alhammadi, A.; Bader, F.; Zhang, Z.; Yuen, C.; Debbah, M. Robust Beamforming with Gradient-Based Liquid Neural Network. IEEE Wirel. Commun. Lett. 2024, 13, 3020–3024. [Google Scholar] [CrossRef]
  12. Zhu, F.; Wang, B.; Yang, Z.; Huang, C.; Zhang, Z.; Alexandropoulos, G.C.; Yuen, C.; Debbah, M. Robust Millimeter Beamforming via Self-Supervised Hybrid Deep Learning. In Proceedings of the 2023 31st European Signal Processing Conference (EUSIPCO), Helsinki, Finland, 4–8 September 2023; pp. 915–919. [Google Scholar]
  13. Xu, W.; Gan, L.; Huang, C. A Robust Deep Learning-Based Beamforming Design for RIS-Assisted Multiuser MISO Communications With Practical Constraints. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 694–706. [Google Scholar] [CrossRef]
  14. Huang, C.; Yang, Z.; Alexandropoulos, G.C.; Xiong, K.; Wei, L.; Yuen, C.; Zhang, Z. Hybrid Beamforming for RIS-Empowered Multi-hop Terahertz Communications: A DRL-based Method. In Proceedings of the 2020 IEEE Globecom Workshops (GC Wkshps), Taipei, Taiwan, 7–11 December 2020; pp. 1–6. [Google Scholar]
  15. Chen, D.; Gao, H.; Chen, N.; Cao, R. Integrated Beamforming and Resource Allocation in RIS-Assisted mmWave Networks based on Deep Reinforcement Learning. In Proceedings of the 2023 21st IEEE Interregional NEWCAS Conference (NEWCAS), Edinburgh, UK, 26–28 June 2023; pp. 1–5. [Google Scholar]
  16. Huang, C.; Yang, Z.; Alexandropoulos, G.C.; Xiong, K.; Wei, L.; Yuen, C.; Zhang, Z.; Debbah, M. Multi-Hop RIS-Empowered Terahertz Communications: A DRL-Based Hybrid Beamforming Design. IEEE J. Sel. Areas Commun. 2021, 39, 1663–1677. [Google Scholar] [CrossRef]
  17. Zhu, F.; Wang, X.; Huang, C.; Yang, Z.; Chen, X.; Al Hammadi, A.; Zhang, Z.; Yuen, C.; Debbah, M. Robust Beamforming for RIS-Aided Communications: Gradient-Based Manifold Meta Learning. IEEE Trans. Wirel. Commun. 2024, 23, 15945–15956. [Google Scholar] [CrossRef]
  18. Wang, X.; Zhu, F.; Zhou, Q.; Yu, Q.; Huang, C.; Alhammadi, A.; Zhang, Z.; Yuen, C.; Debbah, M. Energy-Efficient Beamforming for RISs-Aided Communications: Gradient Based Meta Learning. In Proceedings of the ICC 2024—IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; pp. 3464–3469. [Google Scholar]
  19. Xia, J.; Gunduz, D. Meta-learning Based Beamforming Design for MISO Downlink. In Proceedings of the 2021 IEEE International Symposium on Information Theory (ISIT), Melbourne, Australia, 12–20 July 2021; pp. 2954–2959. [Google Scholar]
  20. Li, R.; Song, X.; Wu, Y.; Yu, X.; Huang, H. AKTMD: Attention-KAN-Based Neural Networks for Transportation Mode Detection. IEEE Access 2025, 13, 63690–63702. [Google Scholar] [CrossRef]
  21. Sun, S.; Rappaport, T.S.; Thomas, T.A.; Ghosh, A.; Nguyen, H.C.; Kovács, I.Z.; Rodriguez, I.; Koymen, O.; Partyka, A. Investigation of Prediction Accuracy, Sensitivity, and Parameter Stability of Large-Scale Propagation Path Loss Models for 5G Wireless Communications. IEEE Trans. Veh. Technol. 2016, 65, 2843–2860. [Google Scholar] [CrossRef]
  22. Zhou, Y.; Liu, X.; Zhai, X.; Zhu, Q.; Durrani, T.S. UAV-Enabled Integrated Sensing, Computing, and Communication for Internet of Things: Joint Resource Allocation and Trajectory Design. IEEE Internet Things J. 2024, 11, 12717–12727. [Google Scholar] [CrossRef]
  23. Diederik, P.; Kingma, J.B. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
  24. Chai, M.; Tang, S.; Zhao, M.; Zhou, W. HPNet: A Compressed Neural Network for Robust Hybrid Precoding in Multi-User Massive MIMO Systems. In Proceedings of the GLOBECOM 2020—2020 IEEE Global Communications Conference, Taipei, Taiwan, 7–11 December 2020; pp. 1–7. [Google Scholar]
  25. Liu, F.; Zhang, L.; Du, R.; Li, D.; Li, T. Two-Stage Hybrid Precoding for Minimizing Residuals Using Convolutional Neural Network. IEEE Commun. Lett. 2021, 25, 3903–3907. [Google Scholar] [CrossRef]
  26. Liu, Y.; Liu, J.; Wu, Q.; Zhang, Y.; Jin, M. A Near-Optimal Iterative Linear Precoding with Low Complexity for Massive MIMO Systems. IEEE Commun. Lett. 2019, 23, 1105–1108. [Google Scholar] [CrossRef]
  27. Capra, M.; Bussolino, B.; Marchisio, A.; Masera, G.; Martina, M.; Shafique, M. Hardware and Software Optimizations for Accelerating Deep Neural Networks: Survey of Current Trends, Challenges, and the Road Ahead. IEEE Access 2020, 8, 225134–225180. [Google Scholar] [CrossRef]
  28. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A Survey of Convolutional Neural Networks: Analysis, Applications, and Prospects. IEEE Trans. Neural Networks Learn. Syst. 2022, 33, 6999–7019. [Google Scholar] [CrossRef]
  29. Guo, M.H.; Xu, T.X.; Liu, J.J.; Liu, Z.N.; Jiang, P.T.; Mu, T.J.; Zhang, S.H.; Martin, R.R.; Cheng, M.M.; Hu, S.M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
  30. Miuccio, L.; Panno, D.; Riolo, S. A Flexible Encoding/Decoding Procedure for 6G SCMA Wireless Networks via Adversarial Machine Learning Techniques. IEEE Trans. Veh. Technol. 2023, 72, 3288–3303. [Google Scholar] [CrossRef]
  31. Liu, Z.; Wang, Y.; Vaidya, S.; Ruehle, F.; Halverson, J.; Soljacic, M.; Hou, T.Y.; Tegmark, M. KAN: Kolmogorov–Arnold Networks. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
Figure 1. System model.
Figure 1. System model.
Electronics 14 03232 g001
Figure 2. KAIP structure.
Figure 2. KAIP structure.
Electronics 14 03232 g002
Figure 3. Data flow process in the attention layer.
Figure 3. Data flow process in the attention layer.
Electronics 14 03232 g003
Figure 4. Convergence of KAIP under different SNR.
Figure 4. Convergence of KAIP under different SNR.
Electronics 14 03232 g004
Figure 5. Simulation results for ISR.
Figure 5. Simulation results for ISR.
Electronics 14 03232 g005
Figure 6. SR performance under a Rician fading channel and various SNR values.
Figure 6. SR performance under a Rician fading channel and various SNR values.
Electronics 14 03232 g006
Figure 7. SR performance under a Rayleigh fading channel and various SNR values.
Figure 7. SR performance under a Rayleigh fading channel and various SNR values.
Electronics 14 03232 g007
Figure 8. Number of communication users vs. SR (Rician fading channel, SNR = 10 dB).
Figure 8. Number of communication users vs. SR (Rician fading channel, SNR = 10 dB).
Electronics 14 03232 g008
Figure 9. Number of communication users vs. SR (Rayleigh fading channel, SNR = 10 dB).
Figure 9. Number of communication users vs. SR (Rayleigh fading channel, SNR = 10 dB).
Electronics 14 03232 g009
Figure 10. Analysis results of α .
Figure 10. Analysis results of α .
Electronics 14 03232 g010
Figure 11. Analysis results of β .
Figure 11. Analysis results of β .
Electronics 14 03232 g011
Figure 12. Analysis results of γ .
Figure 12. Analysis results of γ .
Electronics 14 03232 g012
Figure 13. Simulation results of RP with sensing distance of 30 m (Rician fading channel).
Figure 13. Simulation results of RP with sensing distance of 30 m (Rician fading channel).
Electronics 14 03232 g013
Figure 14. Simulation results of RP with sensing distance of 40 m (Rician fading channel).
Figure 14. Simulation results of RP with sensing distance of 40 m (Rician fading channel).
Electronics 14 03232 g014
Figure 15. Simulation results of RP with sensing distance of 30 m (Rayleigh fading channel).
Figure 15. Simulation results of RP with sensing distance of 30 m (Rayleigh fading channel).
Electronics 14 03232 g015
Figure 16. Simulation results of RP with sensing distance of 40 m (Rayleigh fading channel).
Figure 16. Simulation results of RP with sensing distance of 40 m (Rayleigh fading channel).
Electronics 14 03232 g016
Figure 17. The CT of ZF, MMSE, KAIP, CNN, FCNN.
Figure 17. The CT of ZF, MMSE, KAIP, CNN, FCNN.
Electronics 14 03232 g017
Figure 18. Simulation Results of SW.
Figure 18. Simulation Results of SW.
Electronics 14 03232 g018
Table 1. Parameters for Training KAIP.
Table 1. Parameters for Training KAIP.
ParameterValue
Number of neurons in hidden layer 1512
Number of neurons in hidden layer 2512
Grid size3
Learning rate0.001
Batch size64
Epoch500
Table 2. The setting of FCNN parameters.
Table 2. The setting of FCNN parameters.
ParametersValue
Number of nodes in the first FCNN layer512
Number of nodes in the second FCNN layer512
Learning rate0.001
Batch64
Epoch500
Table 3. The setting of CNN parameters.
Table 3. The setting of CNN parameters.
ParametersValue
Kernel size of CNN4 × 4
Learning rate0.001
Batch64
Epoch500
Table 4. Comparison of computational complex.
Table 4. Comparison of computational complex.
MethodComputational Complex
ZF O ( N 3 + M N 2 )
MMSE O ( N 3 + M N 2 )
FCNN O ( N ( M k 1 + k 1 k 2 + k 2 M ) )
CNN O ( M N ( s 1 2 + s 2 2 ) + N M 2 )
KAIP O ( N 2 d k + N M d k + d k n 1 + n 2 M N )
Table 5. Ablation Study on RP.
Table 5. Ablation Study on RP.
RP (mw)Distance (m)304050
Scheme
KAIP322237178
No Attention287 (−10%)208 (−12%)149 (−16%)
No KAN311 (−3%)225 (−5%)165 (−7%)
Table 6. Ablation study on SR.
Table 6. Ablation study on SR.
SR (bps/Hz)SNR14161820
Scheme
KAIP81879298
No Attention62 (−23%)67 (−23%)71 (−23%)76 (−22%)
No KAN71 (−12%)77 (−11%)80 (−12%)77 (−12%)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, H.; Zhang, W.; Zhang, Z. KAN-and-Attention Based Precoding for Massive MIMO ISAC Systems. Electronics 2025, 14, 3232. https://doi.org/10.3390/electronics14163232

AMA Style

Wang H, Zhang W, Zhang Z. KAN-and-Attention Based Precoding for Massive MIMO ISAC Systems. Electronics. 2025; 14(16):3232. https://doi.org/10.3390/electronics14163232

Chicago/Turabian Style

Wang, Hanyue, Wence Zhang, and Zhiguang Zhang. 2025. "KAN-and-Attention Based Precoding for Massive MIMO ISAC Systems" Electronics 14, no. 16: 3232. https://doi.org/10.3390/electronics14163232

APA Style

Wang, H., Zhang, W., & Zhang, Z. (2025). KAN-and-Attention Based Precoding for Massive MIMO ISAC Systems. Electronics, 14(16), 3232. https://doi.org/10.3390/electronics14163232

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop