A Hybrid Low-Complexity WMMSE Precoder with Adaptive Damping for Massive Multi-User Multiple-Input Multiple- Output Systems

Vaskar Sen; Honggui Deng; Xiaowen Xu; Menghui Shen

doi:10.3390/s25226827

,

and

School of Electronic Information, Central South University, Changsha 410004, China

^*

Author to whom correspondence should be addressed.

Sensors2025, 25(22), 6827;https://doi.org/10.3390/s25226827

This article belongs to the Special Issue Advanced Massive MIMO Antenna Arrays, Metasurfaces and Reconfigurable Intelligent Surfaces for Sensing, Localization, and Wireless Communications: 2nd Edition

Version Notes

Order Reprints

Abstract

Maximizing the weighted sum-rate (WSR) in downlink multi-user multiple-input multipleoutput (MU-MIMO) systems remains computationally challenging due to the prohibitive complexity of classical weighted minimum mean square error (WMMSE) algorithms. In this article, we propose a novel low-complexity WMMSE (LC-WMMSE) precoding method specifically designed for massive MU-MIMO downlink systems. Our algorithm introduces a hybrid switching approach that adaptively blends standard WMMSE updates with computationally simpler approximations derived via the Woodbury matrix identity, coupled with an adaptive damping mechanism to ensure robust and stable convergence. Simulation results demonstrate that the proposed LC-WMMSE method achieves WSR performance comparable to classical WMMSE but with significantly reduced computational complexity, making it particularly suitable for practical implementation for massive MUMIMO systems.

Keywords:

low-complexity precoding; weighted minimum mean square error (WMMSE); massive MU-MIMO; adaptive damping; hybrid switching

1. Introduction

Massive MU-MIMO systems are one of the key enabling technologies for fifthgeneration (5G) and next-generation wireless communication networks, owing to their capability to substantially enhance spectral efficiency, reliability, and network capacity [1,2,3,4]. In multi-user (MU) scenarios, effectively managing inter-user interference through optimal precoding is essential to fully exploit these advantages. Among existing precoding methods, the weighted minimum mean square error (WMMSE) algorithm is widely recognized for delivering near-optimal weighted sum-rate (WSR) maximization in practical MU-MIMO systems [5,6,7]. However, the WMMSE algorithm involves multiple high-dimensional matrix inversions within each iteration, resulting in a computational complexity that scales cubically with the number of base station antennas [8,9]. Such complexity severely restricts the practical feasibility of classical WMMSE for large-scale antenna arrays typically found in massive MU-MIMO deployments. To overcome these limitations, recent research has focused on low-complexity alternatives that approximate the WMMSE performance with minimal loss in optimality. Most existing approaches, however, rely on fixed approximations or simplified update rules [10,11], often resulting in noticeable performance degradation and convergence behavior in large-scale scenarios. The challenge of maximizing the weighted sum rate (WSR) problem in the downlink under a sum power constraint (SPC) is non-convex and known to be (non-deterministic polynomial-time hardness) NPhard [10,12,13]. R-WMMSE [10] computational cost through randomized sketching (data reduction), whereas LC-WMMSE reduces cost via structure exploitation, a Woodbury reformulation with a diagonal-weight surrogate in the transmit update step. The former incurs probabilistic approximation from sketching; the latter is deterministic with complexity mainly tied to the stream dimension rather than the number of base station (BS) antennas. Consequently, our study focused on developing practical, high-performance precoders with manageable computational complexity. Global solutions thus typically involve exponential computational complexity, rendering them impractical for massive MU-MIMO systems. Non-iterative methods, such as maximum ratio transmission (MRT) [14], zero-forcing (ZF) [15], and regularized ZF (RZF) [16] precoding, which offer closed-form solutions with computational efficiency, significantly compromise WSR performance due to their inability to directly optimize the WSR objective. Iterative algorithms for WSR maximization are mainly divided into two categories, one of which is the successive convex approximation (SCA) method. In this approach, the authors [17,18] convex surrogate problems of the non-convex WSR objective and solved the convex problem to increase the WSR, with proven convergence to a stationary point, and various extensions have been proposed to handle different system scenarios [19,20]. The other major class of iterative precoding algorithms is the classical weighted minimum mean-square error (WMMSE) method [21], which exploits the fundamental relationship between the mean-square error (MSE) and the signal-to-interference-plus-noise ratio (SINR). By iteratively minimizing the weighted MSE problem, which is iteratively solved by applying the block coordinate descent (BCD) method [22], this leads to the WMMSE algorithm with three closed-form updates. The WMMSE algorithm updates are derived using the BCD method, ensuring efficient convergence to a stationary point of the WSR maximization problem. Nonetheless, most of these approaches either suffer from noticeable performance degradation or fail to ensure stable convergence in large-scale scenarios. Among these works, in the context of uplink detection, the adaptive damped Jacobi (DJ) method [23] has been proposed to iteratively approximate the MMSE problem solution. The author introduced an adaptive damping Jacobi method that dynamically updates the optimal relaxation factor

ω

with the increase in iterations performance automatically and particularly in correlated channels. This demonstrates the increasing use of adaptive damping techniques for stabilizing and accelerating iterative algorithms for massive MIMO systems. However, using such an adaptive method for the downlink precoding problem remains underexplored. The weighted sum rate (WSR) maximization problem for precoding presents a different set of challenges, with a different system model and a non-convex objective function. Motivated by these challenges, in this paper we propose a novel low-complexity WMMSE (LC-WMMSE) precoding algorithm tailored explicitly for massive MU-MIMO downlink systems. The key innovations of our approach are twofold: First, we introduce a hybrid switching technique, which adaptively combines computationally intensive classical WMMSE updates with lightweight approximations via an adaptive mixing parameter, thereby significantly reducing complexity during initial iterations without compromising the final WSR performance. Second, we integrate an adaptive damping mechanism, which stabilizes precoder updates and ensures robust and reliable convergence behavior throughout iterations of the iterative optimization process. In summary, our primary contributions in this paper are as follows:

We propose a novel low-complexity WMMSE (LC-WMMSE) precoding algorithm that employs the Woodbury identity to avoid large matrix inversions, significantly reducing computational complexity while maintaining near-optimal performance.
We introduce a hybrid switching $ω^{(t)}$ technique that dynamically blends full WMMSE precoder updates with lightweight approximations via an adaptive mixing factor $α^{(t)}$ . This approach strategically reduces computational complexity during initial iterations without compromising the final weighted sum-rate (WSR) performance.
To guarantee monotonic improvement of the WSR objective, we integrate an adaptive damping mechanism into the precoder update procedure. This adaptive strategy significantly enhances convergence stability and robustness, which is beneficial in large-scale system deployments.
We derive closed-form update rules for all core components of the precoding framework. Specifically, receive filters, weight matrices, and precoders, facilitating efficient practical implementation and reducing computational overhead.
Through comprehensive simulations, we demonstrate that our proposed LC-WMMSE algorithm achieves near-identical WSR performance to the classical WMMSE algorithm while substantially reducing computational runtime. Unlike existing low-complexity methods, our algorithm uniquely combines adaptive damping and hybrid switching, resulting in superior convergence reliability and efficiency, particularly suited for massive MU-MIMO deployments.

Table 1 summarizes the MIMO research areas that are the focus of the contributions from the previously cited works. R-WMMSE [10] reduces computational cost by solving the WMMSE normal equations in a compressed domain through randomized sketching, which effectively performs data dimensionality reduction. In contrast, LC-WMMSE retains the full channel representation and achieves cost efficiency through structural exploitation—specifically, by employing a Woodbury matrix identity reformulation and utilizing a diagonal-weight surrogate exclusively during the transmit filter update. Consequently, the two methods differ fundamentally in terms of update dimensionality, dominant computational complexity, and the source of approximation.

Table 1. R-WMMSE and LC-WMMSE SPD: symmetric positive definite.

The remainder of this paper is organized as follows. Section 2 describes the system model and problem formulation. Section 3 proposed LC-WMMSE, including detailed derivations and complexity analysis. Simulation results are presented and discussed in Section 4. Finally, conclusions are drawn in Section 5.

2. System Model

2.1. Downlink System Model

We consider a single-cell downlink MU-MIMO system where a base station (BS) with M transmit antennas serves K users, each equipped with N receive antennas. The downlink channel from the BS to the user k is

H_{k} \in C^{M \times N}, k = 1, \dots, K,

(1)

whose entries are modeled as i.i.d. circularly symmetric complex Gaussian random variables with zero mean and unit variance, i.e., Rayleigh fading. The BS transmits the signal

x = \sum_{k = 1}^{K} P_{k} s_{k} \in C^{M \times 1},

(2)

where

P_{k} \in C^{M \times d_{k}}

is the linear precoder for the user k, and

s_{k} \in C^{d_{k} \times 1}

is the data symbol vector for the user k with

E [s_{k} s_{k}^{H}] = I_{N}

.

Under a flat-fading assumption, the received signal at the user k is

y_{k} = H_{k} x + n_{k} = H_{k} \sum_{j = 1}^{K} P_{j} s_{j} + n_{k},

(3)

where

n_{k} \sim CN (0, σ^{2} I_{N})

is additive white Gaussian noise. The data vectors

{s_{k}}

are mutually independent and independent of

{n_{k}}

. All symbols used in this paper are summarized in Table 2.

Table 2. Summary of notations. Bold denotes matrices/vectors, Hermitian transpose,

{(\cdot)}^{H}

and Frobenius norm

{∥ \cdot ∥}_{F}

.

Remark 1.

(Scalability in Massive MIMO): In practical massive MU-MIMO systems, the number of antennas at the base station (BS) is significantly larger than the number of antennas at each user [24], and the number of users, i.e., we have

M ≫ K \geq N

. In such cases, classical algorithms like WMMSE involve large matrix inversions and thus suffer from high computational complexity that scales poorly with M. To address this, the proposed LC-WMMSE algorithm incorporates hybrid switching and adaptive damping techniques, which substantially reduce the complexity. These techniques allow the algorithm to scale efficiently with the number of BS antennas, achieving complexity that is approximately sub-cubic independent of M in large-scale settings.

2.2. Problem Formulation

A fundamental objective in downlink MU–MIMO is to design the precoders

{P_{k}}_{k = 1}^{K}

to maximize the weighted sum rate (WSR) subject to a transmit power constraint. Let

μ_{k} \geq 0

denote the weight for user k. The WSR defined as

R = \sum_{k = 1}^{K} μ_{k} R_{k},

(4)

where the achievable rate of user k is

R_{k} = {log}_{2} det (I_{N} + Σ_{k}^{- 1} H_{k} P_{k} P_{k}^{H} H_{k}^{H}),

(5)

where the covariance matrix of interference-plus-noise given by

Σ_{k} = σ^{2} I_{N} + \sum_{\begin{matrix} j = 1 \\ j \neq k \end{matrix}}^{K} H_{k} P_{j} P_{j}^{H} H_{k}^{H} .

(6)

The optimization problem is to maximize the WSR over all feasible precoders under either a sum power constraint (SPC). These constraints yield different formulations and trade-offs in performance and complexity.

Under the sum power constraint (SPC), the WSR maximization problem can be formulated as

\begin{matrix} max_{{P_{k}}} & \sum_{k = 1}^{K} μ_{k} R_{k}, \\ s . t . & \sum_{k = 1}^{K} tr (P_{k} P_{k}^{H}) \leq P_{max}, \end{matrix}

(7)

where

P_{max}

represents the total transmit power budget of BS. The WSR maximization problem formulated in Equation (7) is challenging due to the highly nonlinear and non-convex nature of the WSR objective function. Moreover, following [13], it can be shown that both problems are NP-hard, as stated in the following proposition.

Proposition 1.

(WSR maximization is NP-hard): Equation (7) is NP-hard under sum power constraints.

3. Proposed LC-WMMSE Algorithm

3.1. The Classical WMMSE Reformulation

The WMMSE framework is mostly used for WSR maximization problems [10]. In this section, we revisit the classical WMMSE approach [21,25] from a purely optimizationtheoretic viewpoint, where the mean square error (MSE) does not require a physically meaningful interpretation. The WSR maximization problem in Equation (7) is non-convex and difficult to solve directly. Using the equivalence between rate maximization and weighted MSE minimization, which can be solved by the BCD method [21], the problem can be reformulated as

\begin{matrix} min_{{W_{k}, U_{k}, P_{k}}} & \sum_{k = 1}^{K} μ_{k} (Tr (W_{k} E_{k}) - log det (W_{k})) \\ s . t . & \sum_{k = 1}^{K} {∥ P_{k} ∥}_{F}^{2} \leq P_{max} \end{matrix}

(8)

Subject to the same transmit power constraint as in Equation (8). Here,

μ_{k} \geq 0

is the priority weight for the user k, and the MSE matrix

E_{k}

for the user k is defined by

E_{k} = E [(U_{k}^{H} y_{k} - s_{k}) {(U_{k}^{H} y_{k} - s_{k})}^{H}] .

(9)

where the mean square error (MSE) matrix for the user k,

U_{k} \in C^{N \times d_{k}}

is the receive filter and

W_{k} \in C^{d_{k} \times d_{k}}

is the weight matrix, both to be optimized jointly with the precoders

P_{k}

. Expanding the MSE matrix

E_{k}

,

\begin{matrix} E_{k} & = I_{N} - U_{k}^{H} H_{k} P_{k} - P_{k}^{H} H_{k} U_{k} \\ + \sum_{j = 1}^{K} U_{k}^{H} H_{k} P_{j} P_{j}^{H} H_{k}^{H} U_{k} + σ^{2} U_{k}^{H} U_{k} . \end{matrix}

(10)

The reformulated objective in Equation (8) is jointly non-convex in

{P_{k}, U_{k}, W_{k}}

, but is convex in each variable individually. Therefore, the optimization can be solved using an alternating optimization approach as follows:

U_{k} = {(\sum_{j = 1}^{K} H_{k} P_{j} P_{j}^{H} H_{k}^{H} + σ^{2} I_{N})}^{- 1} H_{k} P_{k}, \forall k .

(11)

The update of

W_{k}

while fixing the other two block variables is given by

\begin{matrix} W_{k} = μ_{k} {(E_{k} + ε I_{d_{k}})}^{- 1}, E_{k} \in C^{d_{k} \times d_{k}}, \forall k . \end{matrix}

(12)

While fixing

U_{k}

and

W_{k}

, the precoder update is obtained by solving the following problem

\begin{matrix} A & ≜ σ^{2} I_{M} + \sum_{k = 1}^{K} H_{k} U_{k} W_{k} U_{k}^{H} H_{k}^{H} \in C^{M \times M}, \end{matrix}

(13)

\begin{matrix} B & ≜ [H_{1} U_{1} W_{1} \dots H_{K} U_{K} W_{K}] \in C^{M \times D}, D = \sum_{k = 1}^{K} d_{k} . \end{matrix}

(14)

where,

A \in C^{M \times M}, B \in C^{M \times D}, P \in C^{M \times D}, P_{k} \in C^{M \times d_{k}}

. We solve the linear system

A P = B

for

P

; equivalently,

P = A^{- 1} B ⟺ P_{k} = A^{- 1} (H_{k} U_{k} W_{k}), k = 1, \dots, K .

(15)

Although the same

A

is used for all users, the right-hand blocks

H_{k} U_{k} W_{k}

differ; hence, the results

P_{k}

are user-specific. Since

σ^{2} > 0

and

ε > 0

in Equation (12),

A

is Hermitian positive definite, the system is well posed.

s . t . P \leftarrow \sqrt{\frac{P_{max}}{max (| | P | |_{F}^{2}, ϵ)}} \cdot P .

(16)

Here,

P_{m a x}

is the total transmit power and

ϵ > 0

is a small regularization constant (e.g.,

10^{- 8}

) for numerical stability. For the LC update, we replace

W_{k}

by

D_{k} = diag (W_{k})

in Equation (13), Equation (14) and compute

P

via the Woodbury identity to avoid the

M \times M

inversion.

Although the classical WMMSE precoding algorithm involves multiple large-scale matrix inversions at each iteration, each of size

M \times M

. Thus, the computational complexity is dominated by these inversions, resulting in a prohibitive cubic complexity of

O (M^{3})

. This complexity becomes particularly challenging in massive MU-MIMO scenarios where M is very large. Thus, each iteration requires cubic operations, severely limiting scalability in massive MU-MIMO deployments. This motivates the need for efficient alternatives that reduce matrix inversion cost, as addressed in our proposed LC-WMMSE framework in the next subsection.

3.2. Proposed LC-WMMSE

In this subsection, as we mentioned in Section 3.1, the original WMMSE algorithm for the SPC case in [21] requires a high-dimensional matrix operation at each iteration. Motivated by the prohibitive cubic complexity, we propose a novel LC-WMMSE precoding method designed explicitly to reduce computational complexity significantly while maintaining near-optimal performance for massive MU-MIMO systems. Our method integrates hybrid switching, adaptive damping, and simplified precoding approximations to significantly reduce computational complexity while maintaining robust convergence and high performance.

3.2.1. Problem Reformulation

Our LC-WMMSE replaces the

M \times M

inversion in Equations (13)–(15) by a

(N K) \times (N K)

solve via Woodbury, cutting the dominant per-iteration cost from

O (M^{3})

to

O (M {(N K)}^{2}) + {(N K)}^{3})

in the massive MIMO regime

M ≫ N K

as follows:

Hybrid Transmit Precoder Update: The transmit precoder update at each iteration is computed using a hybrid combination of the classical WMMSE precoder $P_{WMMSE}^{(t)}$ and a low-complexity approximation precoder $P_{LC}^{(t)}$ as follows:

$P^{(t)} = ω^{(t)} P_{WMMSE}^{(t)} + (1 - ω^{(t)}) P_{LC}^{(t)} .$

(17)

where $P_{WMMSE}^{(t)}$ is the classical precoder from Equation (15), $P_{LC}^{(t)}$ is a low-complexity approximation precoder computed with simplified operations to avoid costly matrix inversions and $ω^{(t)} \in [0, 1]$ is an adaptive switching factor designed to balance accuracy and computational efficiency. Specifically, we define $ω^{(t)}$ as

$ω^{(t)} = \frac{∥ Δ E^{(t)} ∥_{F}}{∥ Δ E^{(t)} ∥_{F} + κ}$

(18)

where

$Δ E^{(t)} = \sum_{k = 1}^{K} E_{k}^{(t)} - E_{k}^{(t - 1)}$

(19)

The factor $ω^{(t)} \in [0, 1]$ in Equations (18) and (19) measures how much the per-iteration MSE changes: When $∥ Δ E^{(t)} ∥_{F}$ is large (the algorithm is far from a fixed point), $ω^{(t)} \approx 1$ , we favor the accurate WMMSE update; near convergence $∥ Δ E^{(t)} ∥_{F}$ is small, so $ω^{(t)} \approx 0$ , we favor the low-complexity step to save computation. This approach of monitoring convergence progress to guide algorithmic behavior follows established optimization principles [22]. The constant $κ > 0$ smooths the ratio and prevents division by zero (we use $κ = 10^{- 3}$ unless otherwise stated). In our experiments, results are insensitive to $κ \in [10^{- 4}, 10^{- 2}]$ (final WSR variation $< 0.3 %$ ). The weight as

$D_{k}^{(t)} = diag (diag (W_{k}^{(t)})) ≻ 0, D_{k}^{(t)} \in C^{d_{k} \times d_{k}}, k = 1, \dots, K .$

(20)

We approximate the full weight by its diagonal $D_{k}^{(t)} = diag (diag (W_{k}^{(t)})) \in C^{d_{k} \times d_{k}}$ , which preserves positive definiteness (diagonal entries are strictly positive due to the regularized MSE) while removing inter-stream couplings. This diagonal form is key to building the block-diagonal matrix in Equation (21), enabling a smaller inversion in the Woodbury step. Using $U_{k}^{(t)}$ and $D_{k}^{(t)}$ , we set

$S^{(t)} = blkdiag (U_{1}^{(t)} D_{1}^{(t)} {U_{1}^{(t)}}^{H}, \dots, U_{K}^{(t)} D_{K}^{(t)} {U_{K}^{(t)}}^{H}) \in C^{(N K) \times (N K)},$

(21)

Each block $U_{k}^{(t)} D_{k}^{(t)} {U_{k}^{(t)}}^{H}$ is Hermitian positive definite; therefore $S^{(t)} ≻ 0$ . In the LC update, the inverse of $S^{(t)}$ appears inside a $(N K) \times (N K)$ inversion, so the cubic term scales with $N K$ rather than M. We horizontally stack the user channels as

$H = [H_{1}, \dots, H_{K}] \in C^{M \times (N K)},$

(22)

This makes the normal matrix $σ^{2} I_{M} + H S^{(t)} H^{H}$ compact and enables the Woodbury identity to trade an $M \times M$ inversion for a $(N K) \times (N K)$ inversion. The right-hand factor for the precoder update,

$B^{(t)} = [H_{1} U_{1}^{(t)} D_{1}^{(t)}, \dots, H_{K} U_{K}^{(t)} D_{K}^{(t)}] \in C^{M \times D} .$

(23)

Here $B^{(t)}$ concatenates the per-user factors; the k-th block column generates $P_{k}^{(t)}$ . With the stacked form, the Woodbury precoder Equation (32) returns $P^{(t)} \in C^{M \times D}$ whose k-th column block is the user precoder $P_{k}^{(t)}$ . Forming $H^{H} B^{(t)}$ costs $O (M \cdot N K \cdot D)$ , followed by a $(N K) \times (N K)$ SPD inversion—much cheaper than an $M \times M$ inversion when $N K ≪ M$ . Similarly to prior works [10,21], we apply global power normalization at each iteration to ensure the total transmit power constraint is satisfied. After updating the precoders, we scale them uniformly as follows:

$P_{k}^{(t)} \leftarrow \sqrt{\frac{P_{m a x}}{\sum_{k = 1}^{K} {∥ P_{k}^{(t)} ∥}_{F}^{2}}} P_{k}^{(t)}, \forall k,$

(24)

where $P_{m a x}$ is the total transmit power budget of BS. This approach simplifies implementation and preserves convergence, leveraging the fact that the WSR objective is invariant to common scaling of the precoders.
We simplify the computationally intensive classical WMMSE precoder update by approximating the involved matrix inversions. Specifically, the proposed hybrid switching approach significantly reduces the frequency of expensive matrix inversions during the iterative procedure. Furthermore, the simplified low-complexity approximation in Equations (20)–(23) employs diagonal approximations and diagonal loading instead of a full $M \times M$ matrix inversion, thus reducing complexity from cubic.
Adaptive Damping Factor: To ensure stable and monotonic convergence, we adapt the damping as

$Δ R^{(t)} ≜ | {W S R}^{(t)} - {W S R}^{(t - 1)} |, α^{(t)} = clip (1 - \frac{Δ R^{(t)}}{η}, α_{min}, α_{max}),$

(25)

The adaptive damping $α^{(t)} = clip (1 - Δ R^{(t)} / η, α_{min}, α_{max})$ reduces the step size when the WSR varies rapidly (large $Δ R^{(t)}$ ), which stabilizes the iterates without sacrificing monotonic ascent; when changes are small, it allows larger updates for faster progress. Unless otherwise stated, we use $η = 10^{- 3}$ , $α_{min} = 0.2$ , and $α_{max} = 0.9$ in all experiments, and we apply a short Armijo backtracking (up to 5 trials) to ensure $WSR (P^{(t + 1)}) \geq WSR (P^{(t)})$ . Sensitivity tests showed the results are robust for $η \in [10^{- 4}, 10^{- 2}]$ . The smoothed precoder update is

$P^{(t)} \leftarrow α^{(t)} P^{(t - 1)} + (1 - α^{(t)}) {\hat{P}}^{(t)},$

(26)

where ${\hat{P}}^{(t)}$ is the current LC-WMMSE update before damping. We apply a short Armijo backtracking on $α^{(t)}$ (at most 5 trials) and accept the first $α^{(t)}$ such that $W S R (α^{(t)} P^{(t - 1)} + (1 - α^{(t)}) {\hat{P}}^{(t)}) \geq W S R (P^{(t - 1)})$ . This stabilizes the iterates and typically does not increase runtime. The adaptive damping mechanism dynamically adjusts the update steps based on the rate of improvement at each iteration, ensuring stable convergence. At iteration t, the instantaneous WSR achieved by the proposed LC-WMMSE algorithm is given by

${W S R}^{(t)} = \sum_{k = 1}^{K} μ_{k} {log}_{2} det (I_{N} + H_{k} P_{k}^{(t)} P_{k}^{(t) H} H_{k}^{H} {(σ^{2} I_{N} + \sum_{j \neq k} H_{k} P_{j}^{(t)} P_{j}^{(t) H} H_{k}^{H})}^{- 1})$

(27)

$s . t . \sum_{k = 1}^{K} {∥ P_{k} ∥}_{F}^{2} = P_{max}$

(28)

which respects the total power budget at the transmitter. ${P_{k}^{(t)}}$ denotes the precoders updated at iteration t. This metric is used to monitor convergence and evaluate performance.

3.2.2. Adaptive Damping Mechanism

Figure 1 compares LC-WMMSE with adaptive damping, Fixed damping (

α = 0.8

), and None (

α \equiv 1

) at

M = 128

,

K = 16

,

N = 4

and SNR 20 dB (mean over 100 trials). In Table 3 all variants reach essentially the same final WSR (Adaptive

424.90 \pm 2.01

, Fixed

425.79 \pm 2.03

, None

424.42 \pm 2.12

), but adaptive attains the plateau in far fewer iterations and exhibits smaller late-iteration oscillations. This confirms that adaptive damping improves convergence speed and stability without degrading WSR.

Figure 1. WSR vs. iteration at 20 dB (i.i.d. Rayleigh), averaged over 100 trials. Final WSRs are nearly identical (Adaptive

424.90 \pm 2.01

, Fixed

425.79 \pm 2.03

, None

424.42 \pm 2.12

). Adaptive reaches the plateau in a few iterations, whereas Fixed converges slowly, and None shows larger early overshoots.

Table 3. Ablation study of damping mechanisms (mean over 100 trials at 20 dB).

The oscillation index quantifies the variance observed over the most recent 10 iterations, with lower values indicating greater convergence stability.

3.3. Proposed LC-WMMSE Updates Precoder

The proposed low-complexity WMMSE (LC-WMMSE) precoding algorithm is summarized in Algorithm 1 and consists of the following three main steps:

Receive Filter Update $U_{k}^{(t)}$ : At iteration t the receive filter for user k is updated as

$U_{k}^{(t)} = {(H_{k}^{H} S_{x}^{(t - 1)} H_{k} + σ^{2} I_{N})}^{- 1} H_{k}^{H} P_{k}^{(t - 1)}, \forall k .$

(29)

Here $S_{x}^{(t - 1)}$ is the BS transmit covariance formed from the precoders at the previous iteration:

$S_{x}^{(t - 1)} = \sum_{j = 1}^{K} P_{j}^{(t - 1)} P_{j}^{{(t - 1)}^{H}} \in C^{M \times M} .$

(30)

The term $H_{k}^{H} S_{x} (t - 1) H_{k}$ captures both the desired-signal covariance and the multiuser interference seen by the user k; the additive noise is modeled by $σ^{2} I_{N}$ . The matrix inside the inverse is Hermitian positive definite, so Equation (29) is well posed (solved via Cholesky), and $U_{k}^{(t)} \in C^{N \times d_{k}}$ .
Weight Matrix Update $W_{k}^{(t)}$ : The weight matrix is updated as

$W_{k}^{(t)} = μ_{k} {(E_{k}^{(t)} + ε I_{d_{k}})}^{- 1}, \forall k .$

(31)

$W_{k}^{(t)} = μ_{k} {(E_{k}^{(t)} + ε I_{d_{k}})}^{- 1} \in C^{d_{k} \times d_{k}}$ , where $E_{k}^{(t)}$ is the $d_{k} \times d_{k}$ MSE matrix evaluated with $U_{k} (t)$ and $P_{k} (t - 1)$ . The small $ε > 0$ regularizes the inversion and improves conditioning, and $μ_{k} > 0$ sets stream/user priorities (e.g., for WSR maximization). Thus $W_{k}^{(t)}$ is diagonal and positive definite, which is subsequently exploited by our low-complexity update in Equation (32).
Transmit Precoder Update $P^{(t)}$ : The transmit precoders are updated by solving a convex quadratic problem

$P^{(t)} = \frac{1}{σ^{2}} [B^{(t)} - H {({(S^{(t)})}^{- 1} + \frac{1}{σ^{2}} H^{H} H)}^{- 1} \frac{1}{σ^{2}} H^{H} B^{(t)}] \in C^{M \times D} .$

(32)

With $D_{k}^{(t)}$ , $S^{(t)}$ , $H$ , and $B^{(t)}$ defined in Equations (20)–(23), the Woodbury update in Equation (32) computes the precoder as $P (t) = σ^{- 2} [B^{(t)} - H {({(S^{(t)})}^{- 1} + σ^{- 2} H^{H} H)}^{- 1} σ^{- 2} H^{H} B^{(t)}]$ , which moves the inversion from size M to size $N K$ , yielding per-iteration cost $O (M {(N K)}^{2} + {(N K)}^{3})$ instead of $O (M^{3})$ .

Algorithm 1 Low-Complexity WMMSE (LC-WMMSE) Precoding

Require:: Channel matrices ${H_{k}}_{k = 1}^{K}$ , weights ${μ_{k}}_{k = 1}^{K}$ , noise $σ^{2}$ , sum power $P_{max}$ , max iters T, tolerance $ε$
1:: Initialize $P^{(0)}$ s.t. $\sum_{k = 1}^{K} {∥ P_{k}^{(0)} ∥}_{F}^{2} \leq P_{max}$
2:: for $t = 1$ to T do
3:: for $k = 1$ to K do
4:: Update receive filter $U_{k}^{(t)}$ by (29),
5:: Form MSE $E_{k}^{(t)}$ from $U_{k}^{(t)}$ and $P^{(t - 1)}$
6:: Update weight $W_{k}^{(t)}$ by (31),
7:: end for
8:: Compute switching factor $ω^{(t)}$ by (18) and (19),
9:: Classical candidate: build $A, B$ and compute $P_{WMMSE}^{(t)}$ by (13)–(15),
10:: Low-complexity candidate: compute $P_{LC}^{(t)}$ by (20)–(23),
11:: Hybrid precoder update by (17),
12:: Compute damping $α^{(t)}$ by (25),
13:: Damped update $P^{(t)}$ by (26),
14:: Power normalization via (24),
15:: Compute ${W S R}^{(t)}$ by (27),
16:: if $| W S R^{(t)} - W S R^{(t - 1)} | < ε$ then
17:: break
18:: end if
19:: end for
Ensure:: $P^{(t)} = [P_{1}^{(t)}, \dots, P_{K}^{(t)}]$

3.4. Convergence Analysis

The classical WMMSE algorithm alternates minimization of a convex quadratic surrogate, guaranteeing monotonic ascent of the weighted sum-rate (WSR) [21]. In our LC-WMMSE variant, at iteration t, we compute the low-complexity update

{\hat{P}}^{(t)}

by solving the diagonal-weighted surrogate problem (replacing

W_{k}

with

D_{k} = diag (W_{k})

), then apply the damped update:

P^{(t + 1)} = P^{(t)} + α^{(t)} ({\hat{P}}^{(t)} - P^{(t)}), α^{(t)} \in (0, 1],

(33)

followed by sum power constraint normalization. The step size

α^{(t)}

is selected via Armijo backtracking to ensure immediate WSR improvement.

Proposition 2.

(Convergence): Under the stated Armijo acceptance rule, the sequence

W S R (P^{(t)})

is non-decreasing and converges. We define the transmit-update quadratic at iteration t as

Q^{(t)} (P) = \frac{1}{2} ⟨ P, A^{(t)} P ⟩ - ℜ {⟨ B^{(t)}, P ⟩}

built from

{(U_{k}^{(t)}, W_{k}^{(t)})}_{k = 1}^{K}

. At each iteration t, choose

{\hat{P}}^{(t)} \in {P_{WMMSE}^{(t)}, {\hat{P}}_{LC}^{(t)}}

that minimizes

Q^{(t)}

, and set

P^{(t + 1)} = (1 - α^{(t)}) P^{(t)} + α^{(t)} {\hat{P}}^{(t)}

with Armijo backtracking on

α^{(t)} \in (0, 1]

until

Q^{(t)} (P^{(t + 1)}) \leq Q^{(t)} (P^{(t)}) - γ α^{(t)} {∥ {\hat{P}}^{(t)} - P^{(t)} ∥}_{F}^{2},

for some

γ > 0

. Then, the sequence

{Q^{(t)} (P^{(t)})}

is non-increasing and convergent. Any limit point of

(U_{k}^{(t)}, W_{k}^{(t)}, P^{(t)})

is a stationary point of the classical WMMSE objective if the diagonal surrogate error vanishes asymptotically (i.e.,

W_{k}^{(t)}

becomes diagonally dominant or the hybrid selection converges to

P_{WMMSE}^{(t)}

). Otherwise, the limit point is stationary for the surrogate objective with

D_{k} = diag (diag (W_{k}))

.

Proof.

Non-decrease is followed by construction of the acceptance rule. The WSR objective is bounded above under finite SNR and an SPC. Thus

W S R (P^{(t)})

is a bounded, non-decreasing sequence and therefore converges. Hybrid selection ensures the best descent direction for

Q^{(t)}

; Armijo backtracking gives sufficient decrease, and boundedness below implies convergence of

Q^{(t)}

. The overall scheme is an inexact block-coordinate method; standard results yield stationarity of limit points under vanishing inexactness. □

3.5. Computational Complexity Analysis

Computational complexity is critical for evaluating precoding algorithms in massive MU-MIMO systems. In the classical WMMSE update [21], the dominant per-iteration cost is the

M \times M

precoder solve Equation (15), namely the factorization of

σ^{2} I_{M} + \sum_{k = 1}^{K} H_{k} U_{k} W_{k} U_{k}^{H} H_{k}^{H}

, which is

O (M^{3})

. Receiver and weight updates each cost

O (K N^{3})

. The R-WMMSE algorithm [10], which has linear complexity of

O (M)

.

Using the LC-WMMSE (Woodbury) identity, we rewrite

A^{- 1} = {(σ^{2} I_{M} + H S H^{H})}^{- 1} = \frac{1}{σ^{2}} [I_{M} - H {(S^{- 1} + \frac{1}{σ^{2}} H^{H} H)}^{- 1} H^{H} / σ^{2}],

(34)

With

H = [H_{1}, \dots, H_{K}] \in C^{M \times N K}

and

S = blkdiag (S_{1}, \dots, S_{K})

,

S_{k} = U_{k} diag (W_{k}) U_{k}^{H} \in C^{N \times N}

. The dominant costs per iteration are

Cholesky/solve of $T = S^{- 1} + \frac{1}{σ^{2}} H^{H} H \in C^{(N K) \times (N K)}$ : $O ({(N K)}^{3})$ .
Gram products and multiplies with $H$ (e.g., $H^{H} H$ , $H^{H} B$ ): $O (M {(N K)}^{2})$ .
Per-user $N \times N$ factorizations (for $S_{k}^{- 1}$ and $U_{k}$ ): $O (K N^{3})$ .

Algorithm 1 has a dominant per-iteration cost

O (M {(N K)}^{2}) + O ({(N K)}^{3}) + O (K N^{3})

. In the massive-MIMO regime

M ≫ N K

, this is much smaller than

O (M^{3})

. Computing the hybrid switching factor

ω^{(t)}

uses Frobenius norms of

N \times N

MSE matrices, costing

O (K N^{2})

; the damping

α^{(t)}

is a few scalar operations,

O (1)

. Both mechanisms reduce the total number of iterations T, further lowering wall-clock time. Table 4 summarizes the per-iteration computational complexity of each component for classical WMMSE versus the proposed LC-WMMSE (Woodbury) implementation.

Table 4. Per-iteration computational complexity.

Takeaway: because

N K ≪ M

in massive MU-MIMO, LC-WMMSE replaces the cubic

O (M^{3})

term with operations that scale with

N K

, yielding the speedups observed in Section 4 while preserving WSR performance.

3.6. Implementation Considerations and Overhead Analysis

At iteration t, the BS requires downlink (DL) channel state information (CSI)

H_{k}

and the current receive filters

U_{k}^{(t)}

and weights

W_{k}^{(t)}

. In Time Division Duplex (TDD), the BS estimates

H_{k}

from Uplink (UL) pilots and computes

U_{k}^{(t)}

,

W_{k}^{(t)}

locally (no DL feedback per iteration). In Frequency Division Duplex (FDD), user equipments (UEs) estimate from DL pilots and feed back either (i) full

U_{k}^{(t)} \in C^{N \times d_{k}}

and Hermitian

W_{k}^{(t)} \in C^{d_{k} \times d_{k}}

, or (ii) an LC mode with only

diag (W_{k}^{(t)}) \in R^{d_{k}}

plus a compressed

U_{k}^{(t)}

(e.g., codebook index). With

b_{c}

bits per complex and

b_{r}

per real, the per-user payload is

\approx N d_{k} b_{c} + \frac{d_{k} (d_{k} + 1)}{2} b_{r}

(full) vs.

d_{k} b_{r}

+ codebook bits (LC).

4. Simulations and Results

4.1. Simulation Setup

We consider a single-cell massive MU-MIMO downlink system, where the BS is equipped with M transmit antennas and serves K users, and each user receives a number of data streams equal to their number of receive antennas, with

d_{k} = N

. The total sum power of the BS under the SPC case is set to be

P_{max} = 10 [W]

. The channel matrix

H

is generated according to a circularly symmetric standard complex normal distribution with pathloss between the users and the BS. The pathloss model is set to be

128.1 + 37.6 {log}_{10} (d) [dB]

[26], where d denotes the distance between the user and the BS taking range in

[0.1 \sim 0.3] km

. The noise power is set to be equal for all users and is given by

σ^{2} = \frac{P_{m a x}}{10^{S N R / 10}}

, where the signal-to-noise ratio (SNR) is the average received SNR for all users when no precoding is used. For all simulations, we use the hybrid switching constant

κ = 10^{- 3}

, damping scale

η = 10^{- 3}

, damping bounds

α_{min} = 0.2

and

α_{max} = 0.9

, convergence tolerance

ϵ = 10^{- 4}

, and maximum iterations

T = 100

. Our simulation results are averaged over 100 randomly generated channel realizations and are conducted under the assumption of perfect channel state information (CSI) at the base station. All computations are performed using an intel i7-12700H with RTX Graphics, 3.20 GHz CPU, 16 GB RAM, Windows 11 (64-bit) operating system, and Matlab R2024b environment.

4.2. Low-Complexity (LC WMMSE)

In this subsection, we provide simulation results evaluating the performance of the proposed LC-WMMSE algorithm with hybrid switching and adaptive damping. We compare our method with other baselines, including the WMMSE algorithm in [21] and the R-WMMSE algorithm [10], the non-iterative baseline precoding methods such as the ZF precoding

P_{ZF} = H^{H} {(H H^{H})}^{- 1}

[27], and the BD precoding

P_{k} = V_{E, 0} T_{k}

[28]. These closed-form methods leverage low-dimensional channel properties (e.g., BD uses null-space projection for interference suppression) and offer low computational complexity ZF:

O (K^{3})

, BD:

O (K M^{2})

[27,28], making them practical for massive MU-MIMO systems despite their suboptimal WSR performance. For each trial, we draw

P^{(0)} \sim CN (0, 1)

and scale to satisfy the same

P^{(0)}

is used for all methods to ensure fairness.

First, we show the convergence performance of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm in Figure 2 and Figure 3. The WSR is measured by bits per second per hertz (bps/Hz). Figure 2 and Figure 3 clearly show the proposed LC-WMMSE algorithm and the WMMSE algorithm converge to the same WSR value. Furthermore, it is observed that starting from the same initial point, the LC-WMMSE algorithm often achieves faster convergence in the initial iterations compared to the WMMSE algorithm, while also maintaining competitive performance with the state-of-the-art R-WMMSE algorithm [10] which employs randomized approximations for complexity reduction.

Figure 2. (

M = 64

,

K = 12

,

N = 2

, 10 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 3. (

M = 128

,

K = 16

,

N = 4

, 0 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Secondly, we compare the proposed LC-WMMSE algorithm with the WMMSE algorithm and R-WMMSE algorithm in terms of the average CPU execution time to convergence under different numbers of users K and different numbers of BS transmit antennas M. As can be seen in Figure 4, comparing the average computational complexity, measured in CPU execution time, of the classical WMMSE algorithm and our proposed LC-WMMSE algorithm. The simulation considers a scenario with

M = 128

BS antennas,

N = 2

receive antennas per user, and an average SNR of 10 dB. When the number of users K increases, both algorithms show rising computational demands. While runtime increases with K for all methods, LC-WMMSE exhibits a noticeably flatter growth than classical WMMSE, reflecting its lower per-iteration cost. R-WMMSE—using randomized/sketched updates—achieves the shortest times overall. At

K = 40

, LC-WMMSE requires

4 s

and

6.7 s

for WMMSE (≈40% reduction), and the R-WMMSE completes in about

0.9 s

, i.e., ≈86% faster than WMMSE. However, the proposed LC-WMMSE algorithm consistently achieves lower complexity than the classical WMMSE algorithm.

Figure 4. Average CPU time to convergence versus number of users K (

M = 128

,

N = 2

, 10 dB).

This demonstrates the efficiency of our proposed algorithm, highlighting its suitability for massive MU-MIMO systems, where the number of supported users is typically high. As shown in Figure 5, the simulation scenario is configured with

K = 16

users, each user with

N = 4

receive antennas, at an average SNR of 10 dB. It can be observed that the computational cost for both algorithms increases with M. Especially when

M = 1024

, the classical WMMSE algorithm will take 410 s to converge, while our proposed LC-WMMSE algorithm takes 225 s and the R-WMMSE algorithm takes 4 s because the R-WMMSE algorithm has linear complexity

O (M)

. For instance, at

M > 1000

, the LC-WMMSE algorithm achieves a

1.8 \times

speedup over the classical WMMSE algorithm method. The simulation results presented clearly validate our complexity analysis, demonstrating that the LC-WMMSE algorithm achieves low complexity scaling with respect to M, whereas the classical WMMSE exhibits cubic complexity.

Figure 5. Average CPU time to convergence versus the number of BS antennas M (

K = 16

,

N = 4

, 10 dB).

Lastly, we show the WSR performance of our proposed LC-WMMSE algorithm and other baselines with SNR under the set:

M = 128

,

K = 16

, and

N = 4

. As shown in Figure 6, our proposed LC-WMMSE algorithm achieves almost the same performance as the classical WMMSE algorithm, and the R-WMMSE algorithm yields almost the same performance as the LC-WMMSE algorithm but significantly outperforms the BD and ZF algorithms under different SNR values. Over 0–30 dB, the mean relative gap

Δ_{rel}

of LC-WMMSE to WMMSE is

- 0.44 %

for i.i.d. Rayleigh (see Table 5).

Figure 6. Weighted sum-rate performance with different SNRs (

M = 128

,

K = 16

,

N = 4

).

Table 5. Practical regime (0–30 dB) summary of the relative WSR gap

Δ_{rel} (%) = 100 ({WSR}_{WMMSE} - {WSR}_{LC}) / {WSR}_{WMMSE}

. Values are mean and std across SNRs; worst-loss is the maximum positive gap; best-gain is the maximum

- min Δ_{rel}

(LC over WMMSE). A negative mean indicates LC-WMMSE exceeds classical WMMSE. Setup:

M = 128

,

N = 4

,

K = 16

, SPC

P_{m a x} = 10

, 100 trials/SNR.

4.3. Performance Under Correlated Channels

We also assess robustness under spatial correlation using a Kronecker model at the BS array. For the user k the channel is

H_{k} = R_{BS}^{1 / 2} W_{k}, W_{k} \sim CN (0, I_{M \times N}),

(35)

With exponential BS correlation,

{[R_{BS}]}_{m, n} = r^{| m - n |}, 0 \leq r < 1 .

(36)

All experiments we set

r = 0.5

,

r = 0.7

and obtain

R_{BS}^{1 / 2}

from the Hermitian eigendecomposition of

R_{BS}

. We symmetrize

R_{BS}

numerically and clip tiny negative eigenvalues before taking square roots. Unless noted, the simulation protocol (SNR grid,

(M, N, K)

, initialization, tolerances, and power normalization) is identical to the i.i.d. case.

Figure 7 and Figure 8 demonstrate convergence behavior under a Kronecker-correlated channel with

r = 0.7

for two system sizes and SNRs (means over 100 trials). In both scenarios, LC-WMMSE closely tracks classical WMMSE and reaches the same final WSR, while R-WMMSE converges fastest to a similar value. Over the 0–30 dB practical regime, the mean LC-WMMSE gap under correlation is approximately

0.5 %

(see Table 5). The effect of strong correlation is mainly visible in the early-iteration transient; the steady-state WSR gap between LC-WMMSE and WMMSE remains negligible, confirming robustness across scales and SNR.

Figure 7. Correlated r = 0.7 (M = 64, K = 12, N = 2, 10 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 8. Correlated r = 0.7 (M = 128, K = 16, N = 4, 0 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 9 reports the weighted sum-rate (WSR) versus SNR under BS correlation. As expected, all methods degrade relative to i.i.d. Rayleigh due to reduced spatial degrees of freedom. Importantly, the proposed LC-WMMSE closely tracks classical WMMSE across the entire SNR range while preserving the computational gains reported earlier. R-WMMSE remains the fastest baseline, and the gap between LC-WMMSE and classical WMMSE is visually negligible in WSR, consistent with the i.i.d. case. Over 0–30 dB, the mean relative gap

Δ_{rel}

of LC-WMMSE to WMMSE is

- 0.48 %

for correlated channels (see Table 5). Notably, Table 6 reveals a degradation of R-WMMSE at 30 dB for correlated channels. The cause is sketch-induced approximation bias in solving ill-conditioned normal equations at high SNR; in contrast, deterministic WMMSE and LC-WMMSE avoid this issue and retain the higher WSR.

Figure 9. Weighted sum-rate performance with different SNRs under correlated channels (

M = 128

,

K = 16

,

N = 4

).

Table 6. Weighted sum -rate (bits/s/Hz) comparison under spatial correlation conditions.

Remark 2.

(UE correlation): The same framework accommodates UE-side correlation by using

H_{k} = R_{BS}^{1 / 2} W_{k} R_{UE}^{1 / 2}

with, e.g.,

{[R_{UE}]}_{m, n} = r_{UE}^{| m - n |}

.

5. Conclusions

Weighted sum-rate (WSR) maximization is a fundamental problem for massive MU-MIMO systems. This article has investigated the WSR maximization problems of massive MU-MIMO systems. We introduced a novel LC-WMMSE precoding algorithm specifically designed for massive MU-MIMO downlink systems. To significantly reduce the computational runtime with the classical WMMSE precoding method, our approach integrates a hybrid switching mechanism and an adaptive damping strategy. The core innovation employs the Woodbury matrix identity to transform the dominant

O (M^{3})

matrix inversion into smaller

O ({(N K)}^{3})

operations, while the hybrid switching dynamically balances the computationally intensive standard WMMSE updates with simpler approximations, controlled by an adaptive mixing factor. Simultaneously, the adaptive damping mechanism ensures stable and monotonic convergence behavior throughout the iterations. Our simulation results show that the LC-WMMSE algorithm significantly reduces practical runtime while maintaining high WSR performance, making it practical for massive MU-MIMO systems. Our approach provides a computationally efficient drop-in replacement for classical WMMSE, achieving near-identical performance with substantially reduced complexity. The LC-WMMSE update also extends to hybrid beamforming architectures via the effective channel

{\tilde{H}}_{k} = H_{k} F_{A}

with

N \leftarrow N_{RF}

; a comprehensive study of hybrid beamforming (incorporating phase constraints, codebooks, and quantization) is deferred to future work. For imperfect CSI, the key challenge is preserving our low-complexity Woodbury/diagonal structure; we will use stochastic/robust WMMSE with diagonal inflations and light Tikhonov regularization so the transmit update remains an

(N K) \times (N K)

SPD solve. For per-antenna power constraints (PAPC), coupling across antennas breaks simple normalization; we will introduce per-antenna dual variables so the update becomes

{(A + diag λ)}^{- 1} B

and compute

λ

via bisection/ADMM, preserving LC complexity.

Author Contributions

Conceptualization, V.S. and H.D.; methodology, V.S. and H.D.; software, V.S. and M.S.; validation, V.S., H.D. and X.X.; formal analysis, V.S. and M.S.; investigation, V.S. and X.X.; resources, V.S.; data curation, V.S.; writing—original draft preparation, V.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Acknowledgments

We are grateful to the High Performance Computing Center of Central South University for assistance with the computations.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, M.; Gao, F.; Jin, S.; Lin, H. An Overview of Enhanced Massive MIMO with Array Signal Processing Techniques. IEEE J. Sel. Topics Signal Process. 2019, 13, 886–901. [Google Scholar] [CrossRef]
Marzetta, T.L.; Larsson, E.G.; Yang, H.; Ngo, H.Q. Fundamentals of Massive MIMO; Cambridge University Press: Cambridge, UK, 2016. [Google Scholar]
Pereira de Figueiredo, F.A. An Overview of Massive MIMO for 5G and 6G. IEEE Lat. Am. Trans. 2022, 20, 931–940. [Google Scholar] [CrossRef]
Zhang, J.; Björnson, E.; Matthaiou, M.; Ng, D.W.K.; Yang, H.; Love, D.J. Prospective Multiple Antenna Technologies for Beyond 5G. IEEE J. Sel. Areas Commun. 2020, 38, 1637–1660. [Google Scholar] [CrossRef]
Peng, M.; Sun, Y.; Li, X.; Mao, Z.; Wang, C. Recent Advances in Cloud Radio Access Networks: System Architectures, Key Techniques, and Open Issues. IEEE Commun. Surv. Tutor. 2016, 18, 2282–2308. [Google Scholar] [CrossRef]
Sohrabi, F.; Nuzman, C.; Du, J.; Yang, H.; Viswanathan, H. Energy-Efficient Flat Precoding for MIMO Systems. IEEE Trans. Signal Process. 2025, 73, 795–810. [Google Scholar] [CrossRef]
Choi, H.; Swindlehurst, A.L.; Choi, J. WMMSE-Based Rate Maximization for RIS-Assisted MU-MIMO Systems. IEEE Trans. Commun. 2024, 72, 5194–5208. [Google Scholar] [CrossRef]
Albreem, M.A.; Juntti, M.; Shahabuddin, S. Massive MIMO Detection Techniques: A Survey. IEEE Commun. Surv. Tutor. 2019, 21, 3109–3132. [Google Scholar] [CrossRef]
Xu, Y.; Larsson, E.G.; Jorswieck, E.A.; Li, X.; Jin, S.; Chang, T.H. Distributed Signal Processing for Extremely Large-Scale Antenna Array Systems: State-of-the-Art and Future Directions. IEEE J. Sel. Topics Signal Process. 2025, 19, 304–330. [Google Scholar] [CrossRef]
Zhao, X.; Lu, S.; Shi, Q.; Luo, Z.Q. Rethinking WMMSE: Can Its Complexity Scale Linearly With the Number of BS Antennas? IEEE Trans. Signal Process. 2023, 71, 433–446. [Google Scholar] [CrossRef]
Chen, C.W.; Tsai, W.C.; Wong, S.S.; Teng, C.F.; Wu, A.Y. WMMSE-Based Alternating Optimization for Low-Complexity Multi-IRS MIMO Communication. IEEE Trans. Veh. Technol. 2022, 71, 11234–11239. [Google Scholar] [CrossRef]
Liu, Y.F.; Dai, Y.H.; Luo, Z.Q. Coordinated Beamforming for MISO Interference Channel: Complexity Analysis and Efficient Algorithms. IEEE Trans. Signal Process. 2011, 59, 1142–1157. [Google Scholar] [CrossRef]
Luo, Z.Q.; Zhang, S. Dynamic Spectrum Management: Complexity and Duality. IEEE J. Sel. Topics Signal Process. 2008, 2, 57–73. [Google Scholar] [CrossRef]
Kammoun, A.; Müller, A.; Björnson, E.; Debbah, M. Linear Precoding Based on Polynomial Expansion: Large-Scale Multi-Cell MIMO Systems. IEEE J. Sel. Topics Signal Process. 2014, 8, 861–875. [Google Scholar] [CrossRef]
Gao, X.; Edfors, O.; Rusek, F.; Tufvesson, F. Linear Pre-Coding Performance in Measured Very-Large MIMO Channels. In Proceedings of the 2011 IEEE Vehicular Technology Conference (VTC Fall), San Francisco, CA, USA, 5–8 September 2011; pp. 1–5. [Google Scholar] [CrossRef]
Nguyen, L.D.; Tuan, H.D.; Duong, T.Q.; Poor, H.V. Multi-User Regularized Zero-Forcing Beamforming. IEEE Trans. Signal Process. 2019, 67, 2839–2853. [Google Scholar] [CrossRef]
Shi, C.; Berry, R.A.; Honig, M.L. Monotonic convergence of distributed interference pricing in wireless networks. In Proceedings of the 2009 IEEE International Symposium on Information Theory (ISIT), Seoul, Republic of Korea, 28 June–3 July September 2009; pp. 1619–1623. [Google Scholar] [CrossRef]
Kim, S.J.; Giannakis, G.B. Optimal Resource Allocation for MIMO Ad Hoc Cognitive Radio Networks. IEEE Trans. Inf. Theory 2011, 57, 3117–3131. [Google Scholar] [CrossRef]
Tran, L.N.; Hanif, M.F.; Tolli, A.; Juntti, M. Fast Converging Algorithm for Weighted Sum Rate Maximization in Multicell MISO Downlink. IEEE Signal Process. Lett. 2012, 19, 872–875. [Google Scholar] [CrossRef]
Nguyen, D.H.N.; Le-Ngoc, T. Sum-Rate Maximization in the Multicell MIMO Multiple-Access Channel with Interference Coordination. IEEE Trans. Wireless Commun. 2014, 13, 36–48. [Google Scholar] [CrossRef]
Shi, Q.; Razaviyayn, M.; Luo, Z.Q.; He, C. An Iteratively Weighted MMSE Approach to Distributed Sum-Utility Maximization for a MIMO Interfering Broadcast Channel. IEEE Trans. Signal Process. 2011, 59, 4331–4340. [Google Scholar] [CrossRef]
Bertsekas, D.P. Nonlinear Programming. J. Oper. Res. Soc. 1997, 48, 334. [Google Scholar] [CrossRef]
Zhang, Y.; Yu, A.; Tan, X.; Zhang, Z.; You, X.; Zhang, C. Adaptive Damped Jacobi Detector and Architecture for Massive MIMO Uplink. In Proceedings of the 2018 IEEE Asia Pacific Conference on Circuits and Systems (APCCAS), Chengdu, China, 26–30 October 2018; pp. 203–206. [Google Scholar] [CrossRef]
Björnson, E.; Sanguinetti, L.; Wymeersch, H.; Hoydis, J.; Marzetta, T.L. Massive MIMO is a reality—What is next?: Five promising research directions for antenna arrays. Digit. Signal Process. 2019, 94, 3–20. [Google Scholar] [CrossRef]
Shi, Q.; Xu, W.; Wu, J.; Song, E.; Wang, Y. Secure Beamforming for MIMO Broadcasting With Wireless Information and Power Transfer. IEEE Trans. Wireless Commun. 2015, 14, 2841–2853. [Google Scholar] [CrossRef]
Dahrouj, H.; Yu, W. Coordinated beamforming for the multicell multi-antenna wireless system. IEEE Trans. Wireless Commun. 2010, 9, 1748–1759. [Google Scholar] [CrossRef]
Parfait, T.; Kuang, Y.; Jerry, K. Performance analysis and comparison of ZF and MRT based downlink massive MIMO systems. In Proceedings of the 2014 Sixth International Conference on Ubiquitous and Future Networks (ICUFN), Shanghai, China, 8–11 July 2014; pp. 383–388. [Google Scholar] [CrossRef]
Spencer, Q.; Swindlehurst, A.; Haardt, M. Zero-forcing methods for downlink spatial multiplexing in multiuser MIMO channels. IEEE Trans. Signal Process. 2004, 52, 461–471. [Google Scholar] [CrossRef]

Figure 1. WSR vs. iteration at 20 dB (i.i.d. Rayleigh), averaged over 100 trials. Final WSRs are nearly identical (Adaptive

424.90 \pm 2.01

, Fixed

425.79 \pm 2.03

, None

424.42 \pm 2.12

). Adaptive reaches the plateau in a few iterations, whereas Fixed converges slowly, and None shows larger early overshoots.

Figure 2. (

M = 64

,

K = 12

,

N = 2

, 10 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 3. (

M = 128

,

K = 16

,

N = 4

, 0 dB) Convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 4. Average CPU time to convergence versus number of users K (

M = 128

,

N = 2

, 10 dB).

Figure 5. Average CPU time to convergence versus the number of BS antennas M (

K = 16

,

N = 4

, 10 dB).

Figure 6. Weighted sum-rate performance with different SNRs (

M = 128

,

K = 16

,

N = 4

).

Figure 7. Correlated r = 0.7 (M = 64, K = 12, N = 2, 10 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 8. Correlated r = 0.7 (M = 128, K = 16, N = 4, 0 dB), convergence of the proposed LC-WMMSE algorithm and the classical WMMSE algorithm.

Figure 9. Weighted sum-rate performance with different SNRs under correlated channels (

M = 128

,

K = 16

,

N = 4

).

Table 1. R-WMMSE and LC-WMMSE SPD: symmetric positive definite.

	R-WMMSE	LC-WMMSE (Proposed)
Principle	Randomized sketching	Structure exploitation (Woodbury + $diag (W)$ )
Update size	Compressed (by sketch)	$(N K) \times (N K)$ SPD solve
Dominant cost	Sketch products + small solve	Build $A, B$ + SPD solve
Error source	Sketching bias/variance	Neglect of off-diagonals in $W_{k}$

Table 2. Summary of notations. Bold denotes matrices/vectors, Hermitian transpose,

{(\cdot)}^{H}

and Frobenius norm

{∥ \cdot ∥}_{F}

.

Table 2. Summary of notations. Bold denotes matrices/vectors, Hermitian transpose,

{(\cdot)}^{H}

and Frobenius norm

{∥ \cdot ∥}_{F}

.

Notation	Meaning
M	Number of BS transmit antennas
N	Number of receive antennas per user
K	Number of users
$d_{k}$	Number of streams for user k
$D ≜ \sum_{k = 1}^{K} d_{k}$	Total number of streams
$H_{k} \in C^{M \times N}$	Channel from BS to user k
$P_{k} \in C^{M \times d_{k}}$	Precoder for user k
$P = [P_{1}, \dots, P_{K}] \in C^{M \times D}$	Stacked precoder (all users)
$s_{k} \in C^{d_{k} \times 1}$	Data vector for user k
$x \in C^{M \times 1}$	Transmit signal
$y_{k} \in C^{N \times 1}$	Received signal at user k
$n_{k} \in C^{N \times 1}$	AWGN at user k
$σ^{2}$	Noise power (per receive antenna)
$P_{max}$	Total transmit power (SPC)
$U_{k} \in C^{N \times d_{k}}$	MMSE receive filter for the user k
$W_{k} \in C^{d_{k} \times d_{k}}$	Weight matrix for the k-th user
$E_{k} \in C^{d_{k} \times d_{k}}$	MSE matrix for user k
$D_{k} = diag (diag (W_{k}))$	Diagonal weight approximation
$S_{x} = \sum_{j = 1}^{K} P_{j} P_{j}^{H} \in C^{M \times M}$	BS transmit covariance
$H = [H_{1}, \dots, H_{K}] \in C^{M \times (N K)}$	Stacked channel
$S = blkdiag (U_{1} D_{1} U_{1}^{H}, \dots, U_{K} D_{K} U_{K}^{H}) \in C^{N K \times N K}$	Block diag. weight (LC)
$B_{LC} = [H_{1} U_{1} D_{1}, \dots, H_{K} U_{K} D_{K}] \in C^{M \times D}$	RHS factor (LC update)
$B_{class} = [H_{1} U_{1} W_{1}, \dots, H_{K} U_{K} W_{K}] \in C^{M \times D}$	RHS factor (classical)
$G = H^{H} H \in C^{(N K) \times (N K)}$	Stacked Gram matrix
$I_{M}, I_{N}, I_{d_{k}}$	Identity matrices of sizes M, N, $d_{k}$
$diag (\cdot), blkdiag (\cdot), tr (\cdot)$	Standard operators
$P_{WMMSE}^{(t)}$	Classical WMMSE precoder (iter. t)
$P_{LC}^{(t)}$	LC–WMMSE precoder (iter. t)
$ω^{(t)}$	Hybrid switching factor (iter. t)
$α^{(t)}$	Adaptive damping factor (iter. t)
$W S R^{(t)}$	Weighted sum-rate at iter. t (bps/Hz)

Table 3. Ablation study of damping mechanisms (mean over 100 trials at 20 dB).

Metric	Adaptive	Fixed ( $α = 0.8$ )	None
Final WSR [bit/s/Hz]	$424.90 \pm 2.01$	$425.79 \pm 2.03$	$424.42 \pm 2.12$
Iterations (median)	50	50	50
Oscillation index	0.067	0.012	0.101

Table 4. Per-iteration computational complexity.

Operation	Classical WMMSE	LC–WMMSE (Woodbury)
Precoders solve	$O (M^{3})$	$O (M {(N K)}^{2}) + O ({(N K)}^{3})$
Per-user $N \times N$ factorizations	$O (K N^{3})$	$O (K N^{3})$
Gram products ( $H^{H} H$ , $H^{H} B$ )	$O (M {(N K)}^{2})$	$O (M {(N K)}^{2})$
Hybrid switch $ω^{(t)}$	–	$O (K N^{2})$
Adaptive damping $α^{(t)}$	–	$O (1)$
Total (dominant)	$O (M^{3} + K N^{3})$	$O (M {(N K)}^{2} + {(N K)}^{3} + K N^{3})$

Table 5. Practical regime (0–30 dB) summary of the relative WSR gap

Δ_{rel} (%) = 100 ({WSR}_{WMMSE} - {WSR}_{LC}) / {WSR}_{WMMSE}

. Values are mean and std across SNRs; worst-loss is the maximum positive gap; best-gain is the maximum

- min Δ_{rel}

(LC over WMMSE). A negative mean indicates LC-WMMSE exceeds classical WMMSE. Setup:

M = 128

,

N = 4

,

K = 16

, SPC

P_{m a x} = 10

, 100 trials/SNR.

Table 5. Practical regime (0–30 dB) summary of the relative WSR gap

Δ_{rel} (%) = 100 ({WSR}_{WMMSE} - {WSR}_{LC}) / {WSR}_{WMMSE}

. Values are mean and std across SNRs; worst-loss is the maximum positive gap; best-gain is the maximum

- min Δ_{rel}

(LC over WMMSE). A negative mean indicates LC-WMMSE exceeds classical WMMSE. Setup:

M = 128

,

N = 4

,

K = 16

, SPC

P_{m a x} = 10

, 100 trials/SNR.

Model	Mean ↓	Std	Worst-Loss ↓	Best-Gain ↑
i.i.d. Rayleigh	−0.440	$0.294$	$0.234$	$4.843$
Correlated ( $α = 0.5$ )	−0.480	$0.371$	$0.218$	$6.529$

Table 6. Weighted sum -rate (bits/s/Hz) comparison under spatial correlation conditions.

Algorithm	Moderate Correlation (r = 0.5)			Strong Correlation (r = 0.7)
Algorithm	10 dB	20 dB	30 dB	10 dB	20 dB	30 dB
WMMSE	201	398	603	170	359	563
LC-WMMSE	201	399	604	170	359	564
R-WMMSE	209	401	557	181	364	536
ZF	5	37	146	3	28	130
BD	32	127	290	23	103	258

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

A Hybrid Low-Complexity WMMSE Precoder with Adaptive Damping for Massive Multi-User Multiple-Input Multiple- Output Systems

Abstract

1. Introduction

2. System Model

2.1. Downlink System Model

2.2. Problem Formulation

3. Proposed LC-WMMSE Algorithm

3.1. The Classical WMMSE Reformulation

3.2. Proposed LC-WMMSE

3.2.1. Problem Reformulation

3.2.2. Adaptive Damping Mechanism

3.3. Proposed LC-WMMSE Updates Precoder

3.4. Convergence Analysis

3.5. Computational Complexity Analysis

3.6. Implementation Considerations and Overhead Analysis

4. Simulations and Results

4.1. Simulation Setup

4.2. Low-Complexity (LC WMMSE)

4.3. Performance Under Correlated Channels

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Article Metrics

Citations

Article Access Statistics