Abstract
Maximizing the weighted sum-rate (WSR) in downlink multi-user multiple-input multipleoutput (MU-MIMO) systems remains computationally challenging due to the prohibitive complexity of classical weighted minimum mean square error (WMMSE) algorithms. In this article, we propose a novel low-complexity WMMSE (LC-WMMSE) precoding method specifically designed for massive MU-MIMO downlink systems. Our algorithm introduces a hybrid switching approach that adaptively blends standard WMMSE updates with computationally simpler approximations derived via the Woodbury matrix identity, coupled with an adaptive damping mechanism to ensure robust and stable convergence. Simulation results demonstrate that the proposed LC-WMMSE method achieves WSR performance comparable to classical WMMSE but with significantly reduced computational complexity, making it particularly suitable for practical implementation for massive MUMIMO systems.
1. Introduction
Massive MU-MIMO systems are one of the key enabling technologies for fifthgeneration (5G) and next-generation wireless communication networks, owing to their capability to substantially enhance spectral efficiency, reliability, and network capacity [1,2,3,4]. In multi-user (MU) scenarios, effectively managing inter-user interference through optimal precoding is essential to fully exploit these advantages. Among existing precoding methods, the weighted minimum mean square error (WMMSE) algorithm is widely recognized for delivering near-optimal weighted sum-rate (WSR) maximization in practical MU-MIMO systems [5,6,7]. However, the WMMSE algorithm involves multiple high-dimensional matrix inversions within each iteration, resulting in a computational complexity that scales cubically with the number of base station antennas [8,9]. Such complexity severely restricts the practical feasibility of classical WMMSE for large-scale antenna arrays typically found in massive MU-MIMO deployments. To overcome these limitations, recent research has focused on low-complexity alternatives that approximate the WMMSE performance with minimal loss in optimality. Most existing approaches, however, rely on fixed approximations or simplified update rules [10,11], often resulting in noticeable performance degradation and convergence behavior in large-scale scenarios. The challenge of maximizing the weighted sum rate (WSR) problem in the downlink under a sum power constraint (SPC) is non-convex and known to be (non-deterministic polynomial-time hardness) NPhard [10,12,13]. R-WMMSE [10] computational cost through randomized sketching (data reduction), whereas LC-WMMSE reduces cost via structure exploitation, a Woodbury reformulation with a diagonal-weight surrogate in the transmit update step. The former incurs probabilistic approximation from sketching; the latter is deterministic with complexity mainly tied to the stream dimension rather than the number of base station (BS) antennas. Consequently, our study focused on developing practical, high-performance precoders with manageable computational complexity. Global solutions thus typically involve exponential computational complexity, rendering them impractical for massive MU-MIMO systems. Non-iterative methods, such as maximum ratio transmission (MRT) [14], zero-forcing (ZF) [15], and regularized ZF (RZF) [16] precoding, which offer closed-form solutions with computational efficiency, significantly compromise WSR performance due to their inability to directly optimize the WSR objective. Iterative algorithms for WSR maximization are mainly divided into two categories, one of which is the successive convex approximation (SCA) method. In this approach, the authors [17,18] convex surrogate problems of the non-convex WSR objective and solved the convex problem to increase the WSR, with proven convergence to a stationary point, and various extensions have been proposed to handle different system scenarios [19,20]. The other major class of iterative precoding algorithms is the classical weighted minimum mean-square error (WMMSE) method [21], which exploits the fundamental relationship between the mean-square error (MSE) and the signal-to-interference-plus-noise ratio (SINR). By iteratively minimizing the weighted MSE problem, which is iteratively solved by applying the block coordinate descent (BCD) method [22], this leads to the WMMSE algorithm with three closed-form updates. The WMMSE algorithm updates are derived using the BCD method, ensuring efficient convergence to a stationary point of the WSR maximization problem. Nonetheless, most of these approaches either suffer from noticeable performance degradation or fail to ensure stable convergence in large-scale scenarios. Among these works, in the context of uplink detection, the adaptive damped Jacobi (DJ) method [23] has been proposed to iteratively approximate the MMSE problem solution. The author introduced an adaptive damping Jacobi method that dynamically updates the optimal relaxation factor with the increase in iterations performance automatically and particularly in correlated channels. This demonstrates the increasing use of adaptive damping techniques for stabilizing and accelerating iterative algorithms for massive MIMO systems. However, using such an adaptive method for the downlink precoding problem remains underexplored. The weighted sum rate (WSR) maximization problem for precoding presents a different set of challenges, with a different system model and a non-convex objective function. Motivated by these challenges, in this paper we propose a novel low-complexity WMMSE (LC-WMMSE) precoding algorithm tailored explicitly for massive MU-MIMO downlink systems. The key innovations of our approach are twofold: First, we introduce a hybrid switching technique, which adaptively combines computationally intensive classical WMMSE updates with lightweight approximations via an adaptive mixing parameter, thereby significantly reducing complexity during initial iterations without compromising the final WSR performance. Second, we integrate an adaptive damping mechanism, which stabilizes precoder updates and ensures robust and reliable convergence behavior throughout iterations of the iterative optimization process. In summary, our primary contributions in this paper are as follows:
- We propose a novel low-complexity WMMSE (LC-WMMSE) precoding algorithm that employs the Woodbury identity to avoid large matrix inversions, significantly reducing computational complexity while maintaining near-optimal performance.
- We introduce a hybrid switching technique that dynamically blends full WMMSE precoder updates with lightweight approximations via an adaptive mixing factor . This approach strategically reduces computational complexity during initial iterations without compromising the final weighted sum-rate (WSR) performance.
- To guarantee monotonic improvement of the WSR objective, we integrate an adaptive damping mechanism into the precoder update procedure. This adaptive strategy significantly enhances convergence stability and robustness, which is beneficial in large-scale system deployments.
- We derive closed-form update rules for all core components of the precoding framework. Specifically, receive filters, weight matrices, and precoders, facilitating efficient practical implementation and reducing computational overhead.
- Through comprehensive simulations, we demonstrate that our proposed LC-WMMSE algorithm achieves near-identical WSR performance to the classical WMMSE algorithm while substantially reducing computational runtime. Unlike existing low-complexity methods, our algorithm uniquely combines adaptive damping and hybrid switching, resulting in superior convergence reliability and efficiency, particularly suited for massive MU-MIMO deployments.
Table 1 summarizes the MIMO research areas that are the focus of the contributions from the previously cited works. R-WMMSE [10] reduces computational cost by solving the WMMSE normal equations in a compressed domain through randomized sketching, which effectively performs data dimensionality reduction. In contrast, LC-WMMSE retains the full channel representation and achieves cost efficiency through structural exploitation—specifically, by employing a Woodbury matrix identity reformulation and utilizing a diagonal-weight surrogate exclusively during the transmit filter update. Consequently, the two methods differ fundamentally in terms of update dimensionality, dominant computational complexity, and the source of approximation.
Table 1.
R-WMMSE and LC-WMMSE SPD: symmetric positive definite.
The remainder of this paper is organized as follows. Section 2 describes the system model and problem formulation. Section 3 proposed LC-WMMSE, including detailed derivations and complexity analysis. Simulation results are presented and discussed in Section 4. Finally, conclusions are drawn in Section 5.
2. System Model
2.1. Downlink System Model
We consider a single-cell downlink MU-MIMO system where a base station (BS) with M transmit antennas serves K users, each equipped with N receive antennas. The downlink channel from the BS to the user k is
whose entries are modeled as i.i.d. circularly symmetric complex Gaussian random variables with zero mean and unit variance, i.e., Rayleigh fading. The BS transmits the signal
where is the linear precoder for the user k, and is the data symbol vector for the user k with .
Under a flat-fading assumption, the received signal at the user k is
where is additive white Gaussian noise. The data vectors are mutually independent and independent of . All symbols used in this paper are summarized in Table 2.
Table 2.
Summary of notations. Bold denotes matrices/vectors, Hermitian transpose, and Frobenius norm .
Remark 1.
(Scalability in Massive MIMO): In practical massive MU-MIMO systems, the number of antennas at the base station (BS) is significantly larger than the number of antennas at each user [24], and the number of users, i.e., we have . In such cases, classical algorithms like WMMSE involve large matrix inversions and thus suffer from high computational complexity that scales poorly with M. To address this, the proposed LC-WMMSE algorithm incorporates hybrid switching and adaptive damping techniques, which substantially reduce the complexity. These techniques allow the algorithm to scale efficiently with the number of BS antennas, achieving complexity that is approximately sub-cubic independent of M in large-scale settings.
2.2. Problem Formulation
A fundamental objective in downlink MU–MIMO is to design the precoders to maximize the weighted sum rate (WSR) subject to a transmit power constraint. Let denote the weight for user k. The WSR defined as
where the achievable rate of user k is
where the covariance matrix of interference-plus-noise given by
The optimization problem is to maximize the WSR over all feasible precoders under either a sum power constraint (SPC). These constraints yield different formulations and trade-offs in performance and complexity.
Under the sum power constraint (SPC), the WSR maximization problem can be formulated as
where represents the total transmit power budget of BS. The WSR maximization problem formulated in Equation (7) is challenging due to the highly nonlinear and non-convex nature of the WSR objective function. Moreover, following [13], it can be shown that both problems are NP-hard, as stated in the following proposition.
Proposition 1.
(WSR maximization is NP-hard): Equation (7) is NP-hard under sum power constraints.
3. Proposed LC-WMMSE Algorithm
3.1. The Classical WMMSE Reformulation
The WMMSE framework is mostly used for WSR maximization problems [10]. In this section, we revisit the classical WMMSE approach [21,25] from a purely optimizationtheoretic viewpoint, where the mean square error (MSE) does not require a physically meaningful interpretation. The WSR maximization problem in Equation (7) is non-convex and difficult to solve directly. Using the equivalence between rate maximization and weighted MSE minimization, which can be solved by the BCD method [21], the problem can be reformulated as
Subject to the same transmit power constraint as in Equation (8). Here, is the priority weight for the user k, and the MSE matrix for the user k is defined by
where the mean square error (MSE) matrix for the user k, is the receive filter and is the weight matrix, both to be optimized jointly with the precoders . Expanding the MSE matrix ,
The reformulated objective in Equation (8) is jointly non-convex in , but is convex in each variable individually. Therefore, the optimization can be solved using an alternating optimization approach as follows:
The update of while fixing the other two block variables is given by
While fixing and , the precoder update is obtained by solving the following problem
where, . We solve the linear system for ; equivalently,
Although the same is used for all users, the right-hand blocks differ; hence, the results are user-specific. Since and in Equation (12), is Hermitian positive definite, the system is well posed.
Here, is the total transmit power and is a small regularization constant (e.g., ) for numerical stability. For the LC update, we replace by in Equation (13), Equation (14) and compute via the Woodbury identity to avoid the inversion.
Although the classical WMMSE precoding algorithm involves multiple large-scale matrix inversions at each iteration, each of size . Thus, the computational complexity is dominated by these inversions, resulting in a prohibitive cubic complexity of . This complexity becomes particularly challenging in massive MU-MIMO scenarios where M is very large. Thus, each iteration requires cubic operations, severely limiting scalability in massive MU-MIMO deployments. This motivates the need for efficient alternatives that reduce matrix inversion cost, as addressed in our proposed LC-WMMSE framework in the next subsection.
3.2. Proposed LC-WMMSE
In this subsection, as we mentioned in Section 3.1, the original WMMSE algorithm for the SPC case in [21] requires a high-dimensional matrix operation at each iteration. Motivated by the prohibitive cubic complexity, we propose a novel LC-WMMSE precoding method designed explicitly to reduce computational complexity significantly while maintaining near-optimal performance for massive MU-MIMO systems. Our method integrates hybrid switching, adaptive damping, and simplified precoding approximations to significantly reduce computational complexity while maintaining robust convergence and high performance.
3.2.1. Problem Reformulation
Our LC-WMMSE replaces the inversion in Equations (13)–(15) by a solve via Woodbury, cutting the dominant per-iteration cost from to in the massive MIMO regime as follows:
- Hybrid Transmit Precoder Update: The transmit precoder update at each iteration is computed using a hybrid combination of the classical WMMSE precoder and a low-complexity approximation precoder as follows:where is the classical precoder from Equation (15), is a low-complexity approximation precoder computed with simplified operations to avoid costly matrix inversions and is an adaptive switching factor designed to balance accuracy and computational efficiency. Specifically, we define aswhereThe factor in Equations (18) and (19) measures how much the per-iteration MSE changes: When is large (the algorithm is far from a fixed point), , we favor the accurate WMMSE update; near convergence is small, so , we favor the low-complexity step to save computation. This approach of monitoring convergence progress to guide algorithmic behavior follows established optimization principles [22]. The constant smooths the ratio and prevents division by zero (we use unless otherwise stated). In our experiments, results are insensitive to (final WSR variation ). The weight asWe approximate the full weight by its diagonal , which preserves positive definiteness (diagonal entries are strictly positive due to the regularized MSE) while removing inter-stream couplings. This diagonal form is key to building the block-diagonal matrix in Equation (21), enabling a smaller inversion in the Woodbury step. Using and , we setEach block is Hermitian positive definite; therefore . In the LC update, the inverse of appears inside a inversion, so the cubic term scales with rather than M. We horizontally stack the user channels asThis makes the normal matrix compact and enables the Woodbury identity to trade an inversion for a inversion. The right-hand factor for the precoder update,Here concatenates the per-user factors; the k-th block column generates . With the stacked form, the Woodbury precoder Equation (32) returns whose k-th column block is the user precoder . Forming costs , followed by a SPD inversion—much cheaper than an inversion when . Similarly to prior works [10,21], we apply global power normalization at each iteration to ensure the total transmit power constraint is satisfied. After updating the precoders, we scale them uniformly as follows:where is the total transmit power budget of BS. This approach simplifies implementation and preserves convergence, leveraging the fact that the WSR objective is invariant to common scaling of the precoders.We simplify the computationally intensive classical WMMSE precoder update by approximating the involved matrix inversions. Specifically, the proposed hybrid switching approach significantly reduces the frequency of expensive matrix inversions during the iterative procedure. Furthermore, the simplified low-complexity approximation in Equations (20)–(23) employs diagonal approximations and diagonal loading instead of a full matrix inversion, thus reducing complexity from cubic.
- Adaptive Damping Factor: To ensure stable and monotonic convergence, we adapt the damping asThe adaptive damping reduces the step size when the WSR varies rapidly (large ), which stabilizes the iterates without sacrificing monotonic ascent; when changes are small, it allows larger updates for faster progress. Unless otherwise stated, we use , , and in all experiments, and we apply a short Armijo backtracking (up to 5 trials) to ensure . Sensitivity tests showed the results are robust for . The smoothed precoder update iswhere is the current LC-WMMSE update before damping. We apply a short Armijo backtracking on (at most 5 trials) and accept the first such that . This stabilizes the iterates and typically does not increase runtime. The adaptive damping mechanism dynamically adjusts the update steps based on the rate of improvement at each iteration, ensuring stable convergence. At iteration t, the instantaneous WSR achieved by the proposed LC-WMMSE algorithm is given bywhich respects the total power budget at the transmitter. denotes the precoders updated at iteration t. This metric is used to monitor convergence and evaluate performance.
3.2.2. Adaptive Damping Mechanism
Figure 1 compares LC-WMMSE with adaptive damping, Fixed damping (), and None () at , , and SNR 20 dB (mean over 100 trials). In Table 3 all variants reach essentially the same final WSR (Adaptive , Fixed , None ), but adaptive attains the plateau in far fewer iterations and exhibits smaller late-iteration oscillations. This confirms that adaptive damping improves convergence speed and stability without degrading WSR.
Figure 1.
WSR vs. iteration at 20 dB (i.i.d. Rayleigh), averaged over 100 trials. Final WSRs are nearly identical (Adaptive , Fixed , None ). Adaptive reaches the plateau in a few iterations, whereas Fixed converges slowly, and None shows larger early overshoots.
Table 3.
Ablation study of damping mechanisms (mean over 100 trials at 20 dB).
The oscillation index quantifies the variance observed over the most recent 10 iterations, with lower values indicating greater convergence stability.
3.3. Proposed LC-WMMSE Updates Precoder
The proposed low-complexity WMMSE (LC-WMMSE) precoding algorithm is summarized in Algorithm 1 and consists of the following three main steps:
- Receive Filter Update : At iteration t the receive filter for user k is updated asHere is the BS transmit covariance formed from the precoders at the previous iteration:The term captures both the desired-signal covariance and the multiuser interference seen by the user k; the additive noise is modeled by . The matrix inside the inverse is Hermitian positive definite, so Equation (29) is well posed (solved via Cholesky), and .
- Weight Matrix Update : The weight matrix is updated as, where is the MSE matrix evaluated with and . The small regularizes the inversion and improves conditioning, and sets stream/user priorities (e.g., for WSR maximization). Thus is diagonal and positive definite, which is subsequently exploited by our low-complexity update in Equation (32).
| Algorithm 1 Low-Complexity WMMSE (LC-WMMSE) Precoding |
|
3.4. Convergence Analysis
The classical WMMSE algorithm alternates minimization of a convex quadratic surrogate, guaranteeing monotonic ascent of the weighted sum-rate (WSR) [21]. In our LC-WMMSE variant, at iteration t, we compute the low-complexity update by solving the diagonal-weighted surrogate problem (replacing with ), then apply the damped update:
followed by sum power constraint normalization. The step size is selected via Armijo backtracking to ensure immediate WSR improvement.
Proposition 2.
(Convergence): Under the stated Armijo acceptance rule, the sequence is non-decreasing and converges. We define the transmit-update quadratic at iteration t as built from . At each iteration t, choose that minimizes , and set with Armijo backtracking on until for some . Then, the sequence is non-increasing and convergent. Any limit point of is a stationary point of the classical WMMSE objective if the diagonal surrogate error vanishes asymptotically (i.e., becomes diagonally dominant or the hybrid selection converges to ). Otherwise, the limit point is stationary for the surrogate objective with .
Proof.
Non-decrease is followed by construction of the acceptance rule. The WSR objective is bounded above under finite SNR and an SPC. Thus is a bounded, non-decreasing sequence and therefore converges. Hybrid selection ensures the best descent direction for ; Armijo backtracking gives sufficient decrease, and boundedness below implies convergence of . The overall scheme is an inexact block-coordinate method; standard results yield stationarity of limit points under vanishing inexactness. □
3.5. Computational Complexity Analysis
Computational complexity is critical for evaluating precoding algorithms in massive MU-MIMO systems. In the classical WMMSE update [21], the dominant per-iteration cost is the precoder solve Equation (15), namely the factorization of , which is . Receiver and weight updates each cost . The R-WMMSE algorithm [10], which has linear complexity of .
Using the LC-WMMSE (Woodbury) identity, we rewrite
With and , . The dominant costs per iteration are
- Cholesky/solve of : .
- Gram products and multiplies with (e.g., , ): .
- Per-user factorizations (for and ): .
Algorithm 1 has a dominant per-iteration cost . In the massive-MIMO regime , this is much smaller than . Computing the hybrid switching factor uses Frobenius norms of MSE matrices, costing ; the damping is a few scalar operations, . Both mechanisms reduce the total number of iterations T, further lowering wall-clock time. Table 4 summarizes the per-iteration computational complexity of each component for classical WMMSE versus the proposed LC-WMMSE (Woodbury) implementation.
Table 4.
Per-iteration computational complexity.
Takeaway: because in massive MU-MIMO, LC-WMMSE replaces the cubic term with operations that scale with , yielding the speedups observed in Section 4 while preserving WSR performance.
3.6. Implementation Considerations and Overhead Analysis
At iteration t, the BS requires downlink (DL) channel state information (CSI) and the current receive filters and weights . In Time Division Duplex (TDD), the BS estimates from Uplink (UL) pilots and computes , locally (no DL feedback per iteration). In Frequency Division Duplex (FDD), user equipments (UEs) estimate from DL pilots and feed back either (i) full and Hermitian , or (ii) an LC mode with only plus a compressed (e.g., codebook index). With bits per complex and per real, the per-user payload is (full) vs. + codebook bits (LC).
4. Simulations and Results
4.1. Simulation Setup
We consider a single-cell massive MU-MIMO downlink system, where the BS is equipped with M transmit antennas and serves K users, and each user receives a number of data streams equal to their number of receive antennas, with . The total sum power of the BS under the SPC case is set to be . The channel matrix is generated according to a circularly symmetric standard complex normal distribution with pathloss between the users and the BS. The pathloss model is set to be [26], where d denotes the distance between the user and the BS taking range in . The noise power is set to be equal for all users and is given by , where the signal-to-noise ratio (SNR) is the average received SNR for all users when no precoding is used. For all simulations, we use the hybrid switching constant , damping scale , damping bounds and , convergence tolerance , and maximum iterations . Our simulation results are averaged over 100 randomly generated channel realizations and are conducted under the assumption of perfect channel state information (CSI) at the base station. All computations are performed using an intel i7-12700H with RTX Graphics, 3.20 GHz CPU, 16 GB RAM, Windows 11 (64-bit) operating system, and Matlab R2024b environment.
4.2. Low-Complexity (LC WMMSE)
In this subsection, we provide simulation results evaluating the performance of the proposed LC-WMMSE algorithm with hybrid switching and adaptive damping. We compare our method with other baselines, including the WMMSE algorithm in [21] and the R-WMMSE algorithm [10], the non-iterative baseline precoding methods such as the ZF precoding [27], and the BD precoding [28]. These closed-form methods leverage low-dimensional channel properties (e.g., BD uses null-space projection for interference suppression) and offer low computational complexity ZF: , BD: [27,28], making them practical for massive MU-MIMO systems despite their suboptimal WSR performance. For each trial, we draw and scale to satisfy the same is used for all methods to ensure fairness.
First, we show the convergence performance of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm in Figure 2 and Figure 3. The WSR is measured by bits per second per hertz (bps/Hz). Figure 2 and Figure 3 clearly show the proposed LC-WMMSE algorithm and the WMMSE algorithm converge to the same WSR value. Furthermore, it is observed that starting from the same initial point, the LC-WMMSE algorithm often achieves faster convergence in the initial iterations compared to the WMMSE algorithm, while also maintaining competitive performance with the state-of-the-art R-WMMSE algorithm [10] which employs randomized approximations for complexity reduction.
Figure 2.
(, , , 10 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.
Figure 3.
(, , , 0 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.
Secondly, we compare the proposed LC-WMMSE algorithm with the WMMSE algorithm and R-WMMSE algorithm in terms of the average CPU execution time to convergence under different numbers of users K and different numbers of BS transmit antennas M. As can be seen in Figure 4, comparing the average computational complexity, measured in CPU execution time, of the classical WMMSE algorithm and our proposed LC-WMMSE algorithm. The simulation considers a scenario with BS antennas, receive antennas per user, and an average SNR of 10 dB. When the number of users K increases, both algorithms show rising computational demands. While runtime increases with K for all methods, LC-WMMSE exhibits a noticeably flatter growth than classical WMMSE, reflecting its lower per-iteration cost. R-WMMSE—using randomized/sketched updates—achieves the shortest times overall. At , LC-WMMSE requires and for WMMSE (≈40% reduction), and the R-WMMSE completes in about , i.e., ≈86% faster than WMMSE. However, the proposed LC-WMMSE algorithm consistently achieves lower complexity than the classical WMMSE algorithm.
Figure 4.
Average CPU time to convergence versus number of users K (, , 10 dB).
This demonstrates the efficiency of our proposed algorithm, highlighting its suitability for massive MU-MIMO systems, where the number of supported users is typically high. As shown in Figure 5, the simulation scenario is configured with users, each user with receive antennas, at an average SNR of 10 dB. It can be observed that the computational cost for both algorithms increases with M. Especially when , the classical WMMSE algorithm will take 410 s to converge, while our proposed LC-WMMSE algorithm takes 225 s and the R-WMMSE algorithm takes 4 s because the R-WMMSE algorithm has linear complexity . For instance, at , the LC-WMMSE algorithm achieves a speedup over the classical WMMSE algorithm method. The simulation results presented clearly validate our complexity analysis, demonstrating that the LC-WMMSE algorithm achieves low complexity scaling with respect to M, whereas the classical WMMSE exhibits cubic complexity.
Figure 5.
Average CPU time to convergence versus the number of BS antennas M (, , 10 dB).
Lastly, we show the WSR performance of our proposed LC-WMMSE algorithm and other baselines with SNR under the set: , , and . As shown in Figure 6, our proposed LC-WMMSE algorithm achieves almost the same performance as the classical WMMSE algorithm, and the R-WMMSE algorithm yields almost the same performance as the LC-WMMSE algorithm but significantly outperforms the BD and ZF algorithms under different SNR values. Over 0–30 dB, the mean relative gap of LC-WMMSE to WMMSE is for i.i.d. Rayleigh (see Table 5).
Figure 6.
Weighted sum-rate performance with different SNRs (, , ).
Table 5.
Practical regime (0–30 dB) summary of the relative WSR gap . Values are mean and std across SNRs; worst-loss is the maximum positive gap; best-gain is the maximum (LC over WMMSE). A negative mean indicates LC-WMMSE exceeds classical WMMSE. Setup: , , , SPC , 100 trials/SNR.
4.3. Performance Under Correlated Channels
We also assess robustness under spatial correlation using a Kronecker model at the BS array. For the user k the channel is
With exponential BS correlation,
All experiments we set , and obtain from the Hermitian eigendecomposition of . We symmetrize numerically and clip tiny negative eigenvalues before taking square roots. Unless noted, the simulation protocol (SNR grid, , initialization, tolerances, and power normalization) is identical to the i.i.d. case.
Figure 7 and Figure 8 demonstrate convergence behavior under a Kronecker-correlated channel with for two system sizes and SNRs (means over 100 trials). In both scenarios, LC-WMMSE closely tracks classical WMMSE and reaches the same final WSR, while R-WMMSE converges fastest to a similar value. Over the 0–30 dB practical regime, the mean LC-WMMSE gap under correlation is approximately (see Table 5). The effect of strong correlation is mainly visible in the early-iteration transient; the steady-state WSR gap between LC-WMMSE and WMMSE remains negligible, confirming robustness across scales and SNR.
Figure 7.
Correlated r = 0.7 (M = 64, K = 12, N = 2, 10 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.
Figure 8.
Correlated r = 0.7 (M = 128, K = 16, N = 4, 0 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.
Figure 9 reports the weighted sum-rate (WSR) versus SNR under BS correlation. As expected, all methods degrade relative to i.i.d. Rayleigh due to reduced spatial degrees of freedom. Importantly, the proposed LC-WMMSE closely tracks classical WMMSE across the entire SNR range while preserving the computational gains reported earlier. R-WMMSE remains the fastest baseline, and the gap between LC-WMMSE and classical WMMSE is visually negligible in WSR, consistent with the i.i.d. case. Over 0–30 dB, the mean relative gap of LC-WMMSE to WMMSE is for correlated channels (see Table 5). Notably, Table 6 reveals a degradation of R-WMMSE at 30 dB for correlated channels. The cause is sketch-induced approximation bias in solving ill-conditioned normal equations at high SNR; in contrast, deterministic WMMSE and LC-WMMSE avoid this issue and retain the higher WSR.
Figure 9.
Weighted sum-rate performance with different SNRs under correlated channels (, , ).
Table 6.
Weighted sum -rate (bits/s/Hz) comparison under spatial correlation conditions.
Remark 2.
(UE correlation): The same framework accommodates UE-side correlation by using with, e.g., .
5. Conclusions
Weighted sum-rate (WSR) maximization is a fundamental problem for massive MU-MIMO systems. This article has investigated the WSR maximization problems of massive MU-MIMO systems. We introduced a novel LC-WMMSE precoding algorithm specifically designed for massive MU-MIMO downlink systems. To significantly reduce the computational runtime with the classical WMMSE precoding method, our approach integrates a hybrid switching mechanism and an adaptive damping strategy. The core innovation employs the Woodbury matrix identity to transform the dominant matrix inversion into smaller operations, while the hybrid switching dynamically balances the computationally intensive standard WMMSE updates with simpler approximations, controlled by an adaptive mixing factor. Simultaneously, the adaptive damping mechanism ensures stable and monotonic convergence behavior throughout the iterations. Our simulation results show that the LC-WMMSE algorithm significantly reduces practical runtime while maintaining high WSR performance, making it practical for massive MU-MIMO systems. Our approach provides a computationally efficient drop-in replacement for classical WMMSE, achieving near-identical performance with substantially reduced complexity. The LC-WMMSE update also extends to hybrid beamforming architectures via the effective channel with ; a comprehensive study of hybrid beamforming (incorporating phase constraints, codebooks, and quantization) is deferred to future work. For imperfect CSI, the key challenge is preserving our low-complexity Woodbury/diagonal structure; we will use stochastic/robust WMMSE with diagonal inflations and light Tikhonov regularization so the transmit update remains an SPD solve. For per-antenna power constraints (PAPC), coupling across antennas breaks simple normalization; we will introduce per-antenna dual variables so the update becomes and compute via bisection/ADMM, preserving LC complexity.
Author Contributions
Conceptualization, V.S. and H.D.; methodology, V.S. and H.D.; software, V.S. and M.S.; validation, V.S., H.D. and X.X.; formal analysis, V.S. and M.S.; investigation, V.S. and X.X.; resources, V.S.; data curation, V.S.; writing—original draft preparation, V.S. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data are contained within the article.
Acknowledgments
We are grateful to the High Performance Computing Center of Central South University for assistance with the computations.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Wang, M.; Gao, F.; Jin, S.; Lin, H. An Overview of Enhanced Massive MIMO with Array Signal Processing Techniques. IEEE J. Sel. Topics Signal Process. 2019, 13, 886–901. [Google Scholar] [CrossRef]
- Marzetta, T.L.; Larsson, E.G.; Yang, H.; Ngo, H.Q. Fundamentals of Massive MIMO; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
- Pereira de Figueiredo, F.A. An Overview of Massive MIMO for 5G and 6G. IEEE Lat. Am. Trans. 2022, 20, 931–940. [Google Scholar] [CrossRef]
- Zhang, J.; Björnson, E.; Matthaiou, M.; Ng, D.W.K.; Yang, H.; Love, D.J. Prospective Multiple Antenna Technologies for Beyond 5G. IEEE J. Sel. Areas Commun. 2020, 38, 1637–1660. [Google Scholar] [CrossRef]
- Peng, M.; Sun, Y.; Li, X.; Mao, Z.; Wang, C. Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues. IEEE Commun. Surv. Tutor. 2016, 18, 2282–2308. [Google Scholar] [CrossRef]
- Sohrabi, F.; Nuzman, C.; Du, J.; Yang, H.; Viswanathan, H. Energy-Efficient Flat Precoding for MIMO Systems. IEEE Trans. Signal Process. 2025, 73, 795–810. [Google Scholar] [CrossRef]
- Choi, H.; Swindlehurst, A.L.; Choi, J. WMMSE-Based Rate Maximization for RIS-Assisted MU-MIMO Systems. IEEE Trans. Commun. 2024, 72, 5194–5208. [Google Scholar] [CrossRef]
- Albreem, M.A.; Juntti, M.; Shahabuddin, S. Massive MIMO Detection Techniques: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3109–3132. [Google Scholar] [CrossRef]
- Xu, Y.; Larsson, E.G.; Jorswieck, E.A.; Li, X.; Jin, S.; Chang, T.H. Distributed Signal Processing for Extremely Large-Scale Antenna Array Systems: State-of-the-Art and Future Directions. IEEE J. Sel. Topics Signal Process. 2025, 19, 304–330. [Google Scholar] [CrossRef]
- Zhao, X.; Lu, S.; Shi, Q.; Luo, Z.Q. Rethinking WMMSE: Can Its Complexity Scale Linearly With the Number of BS Antennas? IEEE Trans. Signal Process. 2023, 71, 433–446. [Google Scholar] [CrossRef]
- Chen, C.W.; Tsai, W.C.; Wong, S.S.; Teng, C.F.; Wu, A.Y. WMMSE-Based Alternating Optimization for Low-Complexity Multi-IRS MIMO Communication. IEEE Trans. Veh. Technol. 2022, 71, 11234–11239. [Google Scholar] [CrossRef]
- Liu, Y.F.; Dai, Y.H.; Luo, Z.Q. Coordinated Beamforming for MISO Interference Channel: Complexity Analysis and Efficient Algorithms. IEEE Trans. Signal Process. 2011, 59, 1142–1157. [Google Scholar] [CrossRef]
- Luo, Z.Q.; Zhang, S. Dynamic Spectrum Management: Complexity and Duality. IEEE J. Sel. Topics Signal Process. 2008, 2, 57–73. [Google Scholar] [CrossRef]
- Kammoun, A.; Müller, A.; Björnson, E.; Debbah, M. Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell MIMO Systems. IEEE J. Sel. Topics Signal Process. 2014, 8, 861–875. [Google Scholar] [CrossRef]
- Gao, X.; Edfors, O.; Rusek, F.; Tufvesson, F. Linear Pre-Coding Performance in Measured Very-Large MIMO Channels. In Proceedings of the 2011 IEEE Vehicular Technology Conference (VTC Fall), San Francisco, CA, USA, 5–8 September 2011; pp. 1–5. [Google Scholar] [CrossRef]
- Nguyen, L.D.; Tuan, H.D.; Duong, T.Q.; Poor, H.V. Multi-User Regularized Zero-Forcing Beamforming. IEEE Trans. Signal Process. 2019, 67, 2839–2853. [Google Scholar] [CrossRef]
- Shi, C.; Berry, R.A.; Honig, M.L. Monotonic convergence of distributed interference pricing in wireless networks. In Proceedings of the 2009 IEEE International Symposium on Information Theory (ISIT), Seoul, Republic of Korea, 28 June–3 July September 2009; pp. 1619–1623. [Google Scholar] [CrossRef]
- Kim, S.J.; Giannakis, G.B. Optimal Resource Allocation for MIMO Ad Hoc Cognitive Radio Networks. IEEE Trans. Inf. Theory 2011, 57, 3117–3131. [Google Scholar] [CrossRef]
- Tran, L.N.; Hanif, M.F.; Tolli, A.; Juntti, M. Fast Converging Algorithm for Weighted Sum Rate Maximization in Multicell MISO Downlink. IEEE Signal Process. Lett. 2012, 19, 872–875. [Google Scholar] [CrossRef]
- Nguyen, D.H.N.; Le-Ngoc, T. Sum-Rate Maximization in the Multicell MIMO Multiple-Access Channel with Interference Coordination. IEEE Trans. Wireless Commun. 2014, 13, 36–48. [Google Scholar] [CrossRef]
- Shi, Q.; Razaviyayn, M.; Luo, Z.Q.; He, C. An Iteratively Weighted MMSE Approach to Distributed Sum-Utility Maximization for a MIMO Interfering Broadcast Channel. IEEE Trans. Signal Process. 2011, 59, 4331–4340. [Google Scholar] [CrossRef]
- Bertsekas, D.P. Nonlinear Programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
- Zhang, Y.; Yu, A.; Tan, X.; Zhang, Z.; You, X.; Zhang, C. Adaptive Damped Jacobi Detector and Architecture for Massive MIMO Uplink. In Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Chengdu, China, 26–30 October 2018; pp. 203–206. [Google Scholar] [CrossRef]
- Björnson, E.; Sanguinetti, L.; Wymeersch, H.; Hoydis, J.; Marzetta, T.L. Massive MIMO is a reality—What is next?: Five promising research directions for antenna arrays. Digit. Signal Process. 2019, 94, 3–20. [Google Scholar] [CrossRef]
- Shi, Q.; Xu, W.; Wu, J.; Song, E.; Wang, Y. Secure Beamforming for MIMO Broadcasting With Wireless Information and Power Transfer. IEEE Trans. Wireless Commun. 2015, 14, 2841–2853. [Google Scholar] [CrossRef]
- Dahrouj, H.; Yu, W. Coordinated beamforming for the multicell multi-antenna wireless system. IEEE Trans. Wireless Commun. 2010, 9, 1748–1759. [Google Scholar] [CrossRef]
- Parfait, T.; Kuang, Y.; Jerry, K. Performance analysis and comparison of ZF and MRT based downlink massive MIMO systems. In Proceedings of the 2014 Sixth International Conference on Ubiquitous and Future Networks (ICUFN), Shanghai, China, 8–11 July 2014; pp. 383–388. [Google Scholar] [CrossRef]
- Spencer, Q.; Swindlehurst, A.; Haardt, M. Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans. Signal Process. 2004, 52, 461–471. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).