Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information

Azizi, S. Pourmohammad; Nafei, Amirhossein; Chen, Shu-Chuan; Lin, Rong-Ho

doi:10.3390/math14030509

Open AccessArticle

Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information

¹

Department of Electrical Engineering, National Taiwan Ocean University, Keelung 202301, Taiwan

²

Department of Industrial Engineering and Management, National Taipei University of Technology, Taipei 106344, Taiwan

³

College of Management and Design, Ming Chi University of Technology, New Taipei 243303, Taiwan

^*

Author to whom correspondence should be addressed.

Mathematics 2026, 14(3), 509; https://doi.org/10.3390/math14030509

Submission received: 21 December 2025 / Revised: 14 January 2026 / Accepted: 30 January 2026 / Published: 31 January 2026

(This article belongs to the Special Issue Computational Methods in Wireless Communications with Applications)

Download

Browse Figures

Versions Notes

Abstract

We propose a low-complexity hybrid beamforming method for massive Multiple-Input Multiple-Output (MIMO) systems that is robust to Channel State Information (CSI) estimation errors. These errors stem from hardware impairments, pilot contamination, limited training, and fast fading, causing spectral-efficiency loss. However, existing hybrid beamforming solutions typically either assume near-perfect CSI or rely on greedy/black-box designs without an explicit mechanism to regularize the error-distorted singular modes, leaving a gap in unified, low-complexity, and theoretically grounded robustness. We unfold the Alternating Direction Method of Multipliers (ADMM) into a trainable Deep Learning (DL) network, termed DL-ADMM, to jointly optimize Radio-Frequency (RF) and baseband precoders and combiners. In DL-ADMM, the ADMM update mappings are learned (layer-wise parameters and projections) to amortize the joint RF/baseband optimization, whereas Regularized Singular Value Decomposition (RSVD) acts as an analytical regularizer that reshapes the observed channel’s singular values to suppress noise amplification under imperfect CSI. RSVD is integrated to stabilize singular modes and curb noise amplification, yielding a unified and scalable design. For

σ_{e}^{2} = 0.1

, the proposed DL-ADMM-Reg achieves approximately 8–11 bits/s/Hz higher spectral efficiency than Orthogonal Matching Pursuit (OMP) at Signal-to-Noise Ratio (SNR)

= 20

–40 dB, while remaining within <1 bit/s/Hz of the digital-optimal benchmark across both

(N_{t}, N_{r}) = (32, 32)

and

(64, 64)

settings. Simulations confirm higher spectral efficiency and robustness than OMP and Adaptive Phase Shifters (APSs).

Keywords:

hybrid beamforming; massive MIMO; Deep Learning; Regularized Singular Value Decomposition

MSC:

68T07; 65K10

1. Introduction

Massive Multiple-Input Multiple-Output (MIMO) technology has emerged as a cornerstone of modern wireless communication systems, providing remarkable improvements in spectral efficiency, energy efficiency, and overall system capacity. By employing large antenna arrays, massive MIMO enables simultaneous multi-user transmission and robust interference suppression, which are critical for beyond-5G and 6G networks [1]. To exploit these benefits, efficient beamforming strategies are indispensable. Hybrid beamforming, which integrates low-dimensional digital baseband processing with high-dimensional analog phase shifters, offers a promising compromise between hardware cost and spectral performance, especially in millimeter-wave (mmWave) and terahertz (THz) communication systems [2,3]. Despite its advantages, hybrid beamforming strongly depends on accurate Channel State Information (CSI). In practice, CSI acquisition is limited by pilot overhead, channel non-reciprocity, hardware impairments, and fast-fading environments, which lead to channel estimation errors [4,5]. These errors misguide the beamforming design, causing mismatches in precoders and combiners and ultimately degrading spectral efficiency and channel capacity. In massive MIMO with hundreds of antennas, such mismatches are amplified due to the high dimensionality of the channel, making robustness to CSI errors a critical requirement [6,7].

Several methods have been proposed to address hybrid beamforming design. Conventional algorithms such as Greedy Hybrid Precoding (GHP) [8], which sequentially selects analog beamformers to maximize array gain, Phase Pursuit (PP) [9], which refines phase shifter values in an iterative manner to approximate the optimal precoder, and Orthogonal Matching Pursuit (OMP) [10], which exploits sparse representations of the channel to iteratively select the best matching beams, achieve reasonable performance under perfect CSI. However, these methods face several shortcomings. They are highly sensitive to CSI accuracy; even small estimation errors can result in incorrect selection of baseband and RF chain values, leading to significant spectral efficiency degradation [11]. In addition, many designs rely on predefined analog codebooks, which restrict the search space and fail to adapt to dynamic environments, especially under mobility or non-stationary channels [10,12]. Another drawback is their high computational complexity, as iterative schemes like OMP scale poorly with large antenna arrays and quickly become impractical for real-time systems [13]. More recently, the Adaptive Phase Shifters (APSs) hybrid design [14] has been proposed to reduce the number of required phase shifters and hardware cost while maintaining good spectral efficiency, but robustness under imperfect CSI remains a challenge. Finally, even when channel estimation improves, residual mismatches between the actual and estimated channels remain unavoidable, limiting the performance of these approaches.

In summary, existing hybrid beamforming methods still exhibit two fundamental limitations. First, they are often highly sensitive to noise and CSI estimation errors, where small perturbations can distort the effective singular subspaces and lead to beam misalignment and pronounced spectral-efficiency loss. Second, their robustness is limited in dynamic channels, since codebook/greedy selection and iterative refinement may fail to adapt reliably under mobility and non-stationary propagation, and can converge to suboptimal solutions in large arrays. These issues motivate a design that is both computationally efficient and explicitly regularizes the error-corrupted channel structure.

1.1. Deep Learning-Based Hybrid Beamforming

To overcome these limitations, data-driven strategies have recently gained traction. Machine Learning (ML) and Deep Learning (DL)-based methods have been applied to hybrid beamforming, CSI estimation, and channel prediction [15,16,17]. Although DL frameworks can approximate nonlinear mappings and significantly reduce computational burden, they also have weaknesses. Many such solutions operate as black boxes, lacking interpretability and theoretical guarantees, which makes their performance difficult to predict under varying CSI error statistics. Recent DL-based hybrid beamforming has been explored via supervised networks that learn CSI-to-hybrid-precoder/combiner mappings for mmWave MIMO [18] and via two-stage neural architectures for hybrid precoder design in very-large-scale massive MIMO systems [19]. These predictor-centric approaches motivate our model-driven unfolded design with explicit spectral stabilization to improve robustness under CSI uncertainty. Furthermore, DL models often suffer from limited generalization; those trained on specific datasets may fail to perform reliably in different propagation environments. In addition, the training process itself is demanding, requiring large amounts of labeled data and substantial computational resources, which may not always be practical in real-world deployments.

While standard DL can learn an end-to-end mapping from the observed channel to hybrid beamformers, it is often insufficient under imperfect CSI because it does not explicitly control the conditioning of the error-corrupted channel matrix. In particular, CSI errors can distort the singular spectrum, where small (and noise-sensitive) singular modes induce noise amplification and unstable beam alignment; a purely data-driven model may inadvertently fit these distortions, leading to limited generalization across changing error statistics and propagation environments. Parallel to DL-based research, mathematical regularization techniques have been proposed to address noise sensitivity in matrix factorization. Regularized Singular Value Decomposition (RSVD) has demonstrated effectiveness in suppressing noise amplification and stabilizing solutions to ill-posed problems [20,21]. However, RSVD alone does not address the joint optimization of digital and analog beamformers, and its direct application to hybrid beamforming remains underexplored.

The coexistence of these two streams—DL-based approximators and RSVD-based mathematical stabilizers—suggests a complementary integration. DL frameworks can provide efficient approximations for joint hybrid beamforming optimization, while RSVD offers theoretical robustness against channel estimation errors. Yet, to the best of our knowledge, limited work has explored a unified design that leverages both strategies to achieve low-complexity and error-resilient hybrid beamforming in massive MIMO systems.

In summary, recent studies have investigated DL for hybrid beamforming through (i) end-to-end (E2E) supervised networks that directly learn a mapping from (possibly imperfect) CSI to analog/digital precoders and combiners [22], (ii) convolutional neural network (CNN)-assisted designs that exploit channel covariance/features to predict codebook indices or analog phase profiles [23], and (iii) unsupervised/label-free training that maximizes a differentiable spectral-efficiency objective [24]. While these data-driven approaches can reduce online computation, they are often treated as black-box predictors with limited interpretability and may generalize poorly when the CSI error statistics, signal-to-noise ratio (SNR), or array dimensions deviate from the training distribution; moreover, robustness is typically induced implicitly via data augmentation rather than by an explicit conditioning/regularization mechanism [22,23,24]. In contrast, our method is model-driven: the proposed deep-unfolded Alternating Direction Method of Multipliers (DL-ADMM) preserves the algorithmic structure of the constrained hybrid beamforming problem, where each layer corresponds to a principled ADMM update with learnable step sizes/penalties and closed-form projection operators, yielding interpretable intermediate variables (primal/dual iterates) and predictable complexity scaling. Furthermore, RSVD is introduced as an analytical regularizer that reshapes the error-distorted singular modes of the observed channel to suppress noise amplification, thereby providing a theoretically grounded robustness component that complements the learned optimizer.

1.2. Motivation and Contribution

Motivated by these gaps, this paper introduces a novel Regularized Hybrid Beamforming framework that integrates a DL-enhanced Alternating Direction Method of Multipliers (ADMM) [25] with RSVD theory. By simultaneously optimizing the baseband and RF chains while regularizing the singular values of the observed channel, the proposed scheme mitigates the detrimental effects of estimation mismatches, reduces computational complexity, and enhances spectral efficiency across a wide range of system conditions.

The main contributions of this work are summarized as follows:

DL-Enhanced ADMM for Hybrid Beamforming: We enhance the ADMM [25] by embedding it within a DL-based architecture, termed DL-ADMM, enabling low-complexity joint optimization of transmitter and receiver baseband and RF chain components.
RSVD-Based Robustness: We incorporate RSVD theory [20] into the hybrid beamforming framework to suppress noise amplification and strengthen singular values of the observed channel, thereby mitigating the impact of CSI estimation errors.
Theoretical Foundation: We establish new theoretical results (Theorem 1 and Corollary 1) that prove RSVD achieves lower noise sensitivity compared to standard SVD, providing rigorous justification for its integration into hybrid beamforming.
Complexity and Efficiency Analysis: We analyze the computational complexity of the proposed scheme and compare it against conventional methods such as OMP and APS, demonstrating that DL-ADMM achieves reduced per-iteration complexity without sacrificing performance.
Extensive Simulations: Through comprehensive experiments under diverse antenna configurations, SNR ranges, and channel estimation error variances, we validate that the proposed DL-ADMM-Reg outperforms baseline approaches in terms of spectral efficiency, robustness, and stability.

The remainder of this paper is structured as follows. Section 2 introduces the system model and problem formulation for spectral efficiency under imperfect CSI. Section 3 presents the proposed regularized hybrid beamforming framework, including the DL-ADMM scheme and RSVD-based regularization. Section 4 provides extensive simulation results to benchmark the proposed method against conventional algorithms. Finally, Section 5 concludes the paper and outlines potential directions for future research.

2. System Model and Spectral Efficiency

2.1. System Model

Consider a point-to-point massive MIMO system with

N_{t}

transmit antennas and

N_{r}

receive antennas, designed to support

N_{s}

independent data streams as depicted in Figure 1. Hybrid beamforming is adopted to reduce the number of required RF chains, which are denoted by

N_{R F t}

and

N_{R F r}

for the transmitter and receiver, respectively. These values satisfy the practical constraint

N_{s} \leq N_{R F t} \leq N_{t}

and

N_{s} \leq N_{R F r} \leq N_{r}

. The baseband equivalent input–output relation is expressed as

\tilde{y} = \sqrt{ϱ} W_{B B}^{*} W_{R F}^{*} H F_{R F} F_{B B} x + W_{B B}^{*} W_{R F}^{*} n,

(1)

where

\tilde{y} \in C^{N_{s} \times 1}

is the processed received signal,

ϱ

is the average transmit power, and

x \in C^{N_{s} \times 1}

represents the transmitted symbols with normalized power. The additive noise vector is

n \sim CN (0, σ^{2} I_{N_{r}})

.

Hybrid beamforming employs both analog and digital components: the transmitter uses analog precoding

F_{R F} \in C^{N_{t} \times N_{R F t}}

and digital precoding

F_{B B} \in C^{N_{R F t} \times N_{s}}

, while the receiver uses analog combining

W_{R F} \in C^{N_{r} \times N_{R F r}}

and digital combining

W_{B B} \in C^{N_{R F r} \times N_{s}}

. Typically, the analog precoder/combiner entries are constrained to have unit modulus due to hardware phase shifters, i.e.,

| {[F_{R F}]}_{i, j} | = 1

and

| {[W_{R F}]}_{i, j} | = 1

[10,13].

The physical channel

H

is modeled by a clustered geometric model that captures multipath propagation in mmWave and sub-THz systems:

H = \sqrt{\frac{N_{r} N_{t}}{N_{c l} N_{r a y}}} \sum_{i = 1}^{N_{c l}} \sum_{j = 1}^{N_{r a y}} α_{i j} a_{r} (ϕ_{i j}^{r}, θ_{i j}^{r}) a_{t}^{H} (ϕ_{i j}^{t}, θ_{i j}^{t}),

(2)

where

N_{c l}

and

N_{r a y}

denote the number of scattering clusters and rays per cluster, respectively. The path gain of the j-th ray in the i-th cluster is represented by

α_{i j} \sim CN (0, σ_{α}^{2})

. The vectors

a_{t} (\cdot)

and

a_{r} (\cdot)

are the array response vectors of the transmit and receive UPAs, parameterized by the azimuth and elevation angles of departure and arrival, respectively.

To explicitly reflect the channel conditions, we note that the clustered model in (2) implies an angular-domain sparsity typical for mmWave/sub-THz propagation, where only a small number of dominant paths contribute to the channel. Let

A_{t} \in C^{N_{t} \times G_{t}}

and

A_{r} \in C^{N_{r} \times G_{r}}

denote transmit/receive array-response dictionaries constructed on quantized AoD/AoA grids. The channel can be equivalently expressed in a sparse beamspace form as

H = A_{r} H_{b} A_{t}^{H}, {∥ H_{b} ∥}_{0} \approx N_{c l} N_{r a y} ≪ N_{r} N_{t},

(3)

where

H_{b}

is sparse with non-zero entries corresponding to the dominant propagation directions.

Regarding noise statistics, the receiver noise is modeled as spatially white complex Gaussian,

n \sim CN (0, σ^{2} I_{N_{r}})

, and the post-combining effective noise becomes

\tilde{n} = W_{B B}^{*} W_{R F}^{*} n \sim CN (0, σ^{2} W_{B B}^{*} W_{R F}^{*} W_{R F} W_{B B}),

(4)

which reduces to

CN (0, σ^{2} I_{N_{s}})

under the common semi-unitary normalization

W_{B B}^{*} W_{R F}^{*} W_{R F} W_{B B} = I_{N_{s}} .

(5)

Finally, to model imperfect channel knowledge, we assume that beamformer design is based on an estimated channel

\hat{H} = H + E, vec (E) \sim CN (0, σ_{e}^{2} I_{N_{r} N_{t}}),

(6)

where

σ_{e}^{2}

controls the CSI error level (i.e., the estimation NMSE). Consequently, all subsequent optimization/design steps use

\hat{H}

rather than the ideal

H

, motivating the proposed robustness mechanisms.

2.2. Achievable Spectral Efficiency

Based on (1) and (2), the effective precoder and combiner can be written as

F = F_{R F} F_{B B}

and

W = W_{R F} W_{B B}

. Ideally, the optimal solution is obtained by aligning these matrices with the right and left singular vectors of the channel, respectively. By computing the singular value decomposition (SVD) of

H = {USV}^{*}

, the optimal digital-only precoder and combiner are defined as

F_{opt} = V (:, 1 : N_{s})

and

W_{opt} = U (:, 1 : N_{s})

. However, the hybrid constraints prohibit direct implementation, and the near-optimal hybrid design must solve

\{\begin{matrix} \min_{(W_{B B}, W_{R F})} {∥ W_{opt} - W_{R F} W_{B B} ∥}_{F}, \\ s . t . \\ ∥ W_{R F} W_{B B} ∥_{F} = \sqrt{N_{s}}, | W_{R F_{i, j}} | = 1 . \end{matrix} \{\begin{matrix} \min_{(F_{B B}, F_{R F})} {∥ F_{opt} - F_{R F} F_{B B} ∥}_{F}, \\ s . t . \\ ∥ F_{R F} F_{B B} ∥_{F} = \sqrt{N_{s}}, | F_{R F_{i, j}} | = 1 . \end{matrix}

(7)

where

{∥ . ∥}_{F}

denotes the Frobenius norm. Such approximations are common in hybrid beamforming design, and various algorithms such as OMP [10] have been proposed to solve (7) with different complexity–performance trade-offs.

Once the hybrid precoders and combiners are obtained, the spectral efficiency is defined as

R = {log}_{2} (\det (I_{N_{s}} + \frac{ϱ}{N_{s}} R_{n}^{- 1} G G^{*})),

(8)

where

G = W^{*} H F

is the effective channel after hybrid processing, and

R_{n} = σ^{2} W^{*} W

is the equivalent noise covariance.

When channel estimation errors occur, the observed channel is modeled as

H_{obs} = H + H_{e},

(9)

where

H_{e}

represents the estimation error matrix with variance

σ_{e}^{2}

[4,5]. Performing SVD on

H_{obs} = \hat{U} \hat{S} \hat{V} *

leads to inaccurate beamforming solutions

\hat{F}

and

\hat{W}

. The corresponding achievable spectral efficiency is

\hat{R} = {log}_{2} (\det (I_{N_{s}} + \frac{ϱ}{N_{s}} {\hat{R}}_{n}^{- 1} \hat{G} {\hat{G}}^{*})),

(10)

with

\hat{G} = \hat{W} * H \hat{F}

and

{\hat{R}}_{n} = σ^{2} \hat{W} * \hat{W}

.

In the case of perfect CSI, the spectral efficiency in (8) using

F_{opt}, W_{opt}

serves as the upper bound and is often referred to as the Optimal performance. Under imperfect CSI, the spectral efficiency is inevitably reduced to a lower bound that we term Mismatch, which no conventional algorithm can surpass [6,26,27]. This highlights the inefficiency of traditional approaches under estimation errors and motivates the development of robust and regularized beamforming strategies.

2.3. Conventional ADMM for Hybrid Beamforming

In this section, we detail the conventional ADMM [25] procedure used to approximate the optimal unconstrained singular-vector beams under hybrid (unit–modulus) constraints.

Problem restatement

Given a target matrix

O \in C^{a \times c}

(e.g.,

W_{opt}

or

F_{opt}

from the SVD of

H

), the hybrid factorization seeks

\min_{R, B} ∥ O - R B ∥_{F}^{2} {s . t . ∥ R B ∥}_{F} = ℘, | {[R]}_{i, j} | = 1,

(11)

where

℘

is the normalization (power) constant,

R \in C^{a \times b}

represents the analog (unit–modulus) matrix, and

B \in C^{b \times c}

is the digital baseband matrix.

Our goal is to obtain a hardware-feasible hybrid realization that closely approximates the SVD-derived fully digital target matrix

O \in C^{a \times c}

(e.g.,

F_{opt}

or

W_{opt}

). Specifically, we seek matrices

R \in C^{a \times b}

and

B \in C^{b \times c}

such that the product

R B

minimizes the approximation error

{∥ O - R B ∥}_{F}^{2}

, while satisfying the unit-modulus constraint

{| [R]}_{i, j} | = 1

imposed by analog phase shifters and the power-normalization constraint

{∥ R B ∥}_{F} = ℘

(where

℘

is a known constant). Therefore, (11) provides a clear restatement of the hybrid beamforming/combining design as a constrained matrix factorization problem whose solution yields the closest feasible hybrid approximation to

O

.

Augmented Lagrangian

Introduce an auxiliary variable

Z \in C^{a \times c}

and multiplier

Y \in C^{a \times c}

to decouple the bilinear term via the constraint

Z = R B

. The (scaled) augmented Lagrangian reads

L_{ρ} (R, B, Z, Y) = ∥ Z - O ∥_{F}^{2} + \frac{ρ}{2} ∥ Z - R B + \frac{1}{ρ} Y ∥_{F}^{2},

(12)

with penalty

ρ > 0

.

Z-update (quadratic projection)

Fix

(R, B, Y)

. Minimizing (12) with respect to

Z

yields a closed form:

Z^{(k + 1)} = arg \min_{Z} ∥ Z - O ∥_{F}^{2} + \frac{ρ}{2} ∥ Z - (R^{(k)} B^{(k)} - \frac{1}{ρ} Y^{(k)}) ∥_{F}^{2} = \frac{O + ρ (R^{(k)} B^{(k)} - \frac{1}{ρ} Y^{(k)})}{1 + ρ} .

(13)

B-update (least squares with power normalization)

Fix

(R, Z, Y)

. The digital matrix solves a regularized least-squares problem:

B_{unc}^{(k + 1)} = arg \min_{B} ∥ Z^{(k + 1)} - R^{(k)} B + \frac{1}{ρ} Y^{(k)} ∥_{F}^{2} = {(R^{(k) *} R^{(k)})}^{- 1} R^{(k) *} (Z^{(k + 1)} + \frac{1}{ρ} Y^{(k)}) .

(14)

Enforce the Frobenius-norm power constraint in (11) by scaling:

B^{(k + 1)} = B_{unc}^{(k + 1)} \frac{℘}{∥ R^{(k)} B_{unc}^{(k + 1)} ∥_{F}} .

(15)

R-update (unit–modulus projection)

Fix

(B, Z, Y)

. Disregarding (temporarily) the modulus constraint, the analog update solves

R_{ls}^{(k + 1)} = arg \min_{R} ∥ Z^{(k + 1)} - R B^{(k + 1)} + \frac{1}{ρ} Y^{(k)} ∥_{F}^{2} = (Z^{(k + 1)} + \frac{1}{ρ} Y^{(k)}) B^{(k + 1) *} {(B^{(k + 1)} B^{(k + 1) *})}^{- 1} .

(16)

Project

R_{ls}^{(k + 1)}

element-wise onto the complex unit circle to enforce

{| [R]}_{i, j} | = 1

:

R^{(k + 1)} = exp (j ∠ (R_{ls}^{(k + 1)})), i . e ., {[R^{(k + 1)}]}_{i, j} = e^{j ∠ ({[R_{ls}^{(k + 1)}]}_{i, j})} .

(17)

Dual update (scaled multipliers)

Finally update the Lagrange multiplier:

Y^{(k + 1)} = Y^{(k)} + ρ (Z^{(k + 1)} - R^{(k + 1)} B^{(k + 1)}) .

(18)

Stopping rule and feasibility

The iterations (13)–(18) continue until primal and dual residuals fall below tolerances, e.g.,

\underset{primal residual}{\underset{︸}{∥ Z^{(k + 1)} - R^{(k + 1)} B^{(k + 1)} ∥_{F}}} \leq ε_{pri}, \underset{dual residual}{\underset{︸}{ρ ∥ Z^{(k + 1)} - Z^{(k)} ∥_{F}}} \leq ε_{dual} .

(19)

At convergence, scale

B

once more if necessary to satisfy

{∥ R B ∥}_{F} = ℘

exactly, cf. (15).

The conventional ADMM updates above require repeated large matrix multiplications and inversions in (14)–(16), careful penalty selection

ρ

, and full reruns for every new channel realization; under imperfect CSI, the iterates may converge to sub-optimal factorizations that reduce spectral efficiency. The DL-unfolded counterpart replaces hand-tuned quantities (e.g.,

ρ

, preconditioners, projections) with trainable mappings, amortizes computation across channel instances, and incorporates robustness terms aligned with the SVD structure of the observed channel, thereby improving convergence speed and resilience to estimation errors while keeping per-layer operations inexpensive.

3. DL-ADMM Assistant Regularized Hybrid Beamforming

This section aims to address the issue of wasted computational complexity in hybrid beamforming design under imperfect CSI conditions, while simultaneously enhancing the achievable channel capacity. To achieve this, we integrate DL into the ADMM, yielding a DL-ADMM scheme that leverages both data-driven approximation and model-based optimization.

3.1. DL-ADMM Scheme

The hybrid beamforming design problem can be formulated as the constrained optimization task

\begin{matrix} \{\begin{matrix} \min_{(B, R)} {∥ O - RB ∥}_{F}, \\ {s . t . ∥ RB ∥}_{F} = ℘, | R_{i, j} | = 1, \end{matrix} \end{matrix}

(20)

where

℘

is the normalization (power) constant,

O \in C^{a \times c}

represents the target matrix (either the optimal precoder or combiner),

R \in C^{a \times b}

is the analog beamformer constrained to unit-modulus entries, and

B \in C^{b \times c}

is the corresponding digital component. This problem is inherently non-convex due to the modulus constraints and coupled variables [10,12]. Traditional algorithms such as OMP and alternating minimization are computationally demanding, and they require multiple iterations that scale poorly with antenna array size [12,13].

To efficiently solve (20), we adopt a DL-enhanced ADMM framework. By integrating DL into ADMM, the iterative update rules are unfolded into a layer-wise neural network architecture, where trainable parameters are learned offline and directly applied at runtime, thus drastically reducing online computational complexity. Hence, reformulating (20) through the augmented Lagrangian leads to

\begin{matrix} \min_{(R, B, Z, Y)} {∥ Z - O ∥}_{F}^{2} + 〈 Y, Z - R B 〉 + \frac{ρ}{2} {∥ Z - R B ∥}_{F}^{2}, \end{matrix}

(21)

where

ρ

is the penalty parameter,

Z

is an auxiliary variable, and

Y

is the Lagrange multiplier. The optimization proceeds by alternating updates for

Z

,

R

, and

B

, followed by dual variable updates for

Y

. Each update step is mapped into a neural network layer, with trainable weights

ρ^{(k)}

,

Ω^{(k)}

, and initializations learned from data.

Algorithm 1 summarizes the DL-ADMM procedure, where each iteration corresponds to one network layer. This unrolling strategy allows the network to inherit the interpretability and convergence properties of ADMM while learning problem-specific parameters that improve performance and robustness to imperfect CSI. Such “algorithm unfolding” methods have recently gained wide attention in wireless communications, as they achieve near-optimal accuracy with dramatically lower runtime compared to conventional iterative solvers.

Algorithm 1 DL-ADMM

Require: O, number of layers K, and dimensions (a, b, c)

Ensure: R, B

1: Initialize

R \in C^{a \times b}

and

B \in C^{b \times c}

with random values.

2: for

k = 1 : K

do

3:

Z^{(k)} \leftarrow (O + ρ^{(k)} ⊙ (R B - Y^{(k)} ⊘ ρ^{(k)})) ⊘ (1 + ρ^{(k)})

4:

R \leftarrow exp (i \times angle ((Z^{(k)} + Y^{(k)} ⊘ ρ^{(k)}) Ω^{(k)}))

5:

B \leftarrow Ψ (\frac{1}{a} {IR}^{*} (Z^{(k)} + Y^{(k)} ⊘ ρ^{(k)}), R)

6:

Y^{(k + 1)} \leftarrow Y^{(k)} + ρ^{(k)} ⊙ (Z^{(k)} - R B)

7: end for

Here,

⊙

and

⊘

denote element-wise multiplication and division, respectively, and

\frac{1}{a} {IR}^{*}

is a low-complexity approximation for the pseudo-inverse of

R

. The trainable variables include

ρ^{(k)} \in R^{a \times c}

,

Y^{(1)} \in C^{a \times c}

, and

Ω^{(k)} \in C^{c \times b}

. The activation function

Ψ (\cdot)

ensures the Frobenius norm constraint on the digital matrix:

\begin{matrix} Ψ (B, R) = B \frac{℘}{{∥ R B ∥}_{F}} . \end{matrix}

(22)

The loss function is designed to jointly minimize the approximation error to the optimal unconstrained beamformers and to preserve the effective channel structure:

\begin{matrix} l o s s & = & ∥ W_{R F} W_{B B} - W_{opt} ∥_{F} + {∥ F_{R F} F_{B B} - F_{opt} ∥}_{F} \\ + & ∥ W_{B B}^{*} W_{R F}^{*} H_{o b s} F_{R F} F_{B B} - \bar{S} ∥_{F}, \end{matrix}

(23)

wherein

℘ = \sqrt{N_{s}}

,

\bar{S} = S (1 : N_{s}, 1 : N_{s})

, and

W_{opt} = U (:, 1 : N_{s})

,

F_{opt} = V (:, 1 : N_{s})

are derived from the SVD of

H = {USV}^{*}

. Two parallel instances of Algorithm 1 are trained simultaneously: one with

O = W_{opt}

and dimensions

(a, b, c) = (N_{r}, N_{R F r}, N_{s})

, and the other with

O = F_{opt}

and

(a, b, c) = (N_{t}, N_{R F t}, N_{s})

. This DL-ADMM scheme provides a balance between conventional optimization and data-driven learning. It reduces per-iteration complexity, accelerates convergence, and remains robust to CSI imperfections, as confirmed in prior works on model-driven DL for communications [28].

To facilitate understanding of Algorithm 1, we added Figure 2 to visualize the computation carried out in a representative unfolded layer k. The figure shows that each layer corresponds to one ADMM iteration and introduces only two learnable quantities: the layer-dependent penalty parameter

ρ^{(k)}

and the scaling matrix

Ω^{(k)}

employed within the analog/constraint-update module, while all other steps remain deterministic and follow closed-form ADMM updates and projection operators. Therefore, the number of trainable parameters per layer is determined solely by the degrees of freedom of

ρ^{(k)}

and

Ω^{(k)}

, using the same parameterization across layers. Regarding initialization,

R

and

B

are randomly initialized, the dual variable

Y^{(0)}

is initialized to zero, and the layer-wise trainable variables are initialized by setting

ρ^{(k)} = 1

and drawing

Ω^{(k)}

randomly for all

k = 1, \dots, K

, after which they are optimized jointly in an end-to-end manner via gradient descent.

3.2. Regularized System

This section demonstrates how RSVD theory [20] can mitigate noise and minimize the impact of channel estimation errors by redesigning the singular value. To clarify these concepts, the following theorem and corollary are presented.

Theorem 1.

Let

H \in C^{M \times N}

have singular value decomposition

H = U {S V}^{*}

, where

S = d i a g (σ_{1}, σ_{2}, \dots, σ_{r})

. For the system

y = H x + n

with Gaussian noise

n

, the Regularized SVD (RSVD) solution is:

\begin{matrix} x_{R S V D} = V S_{r e g}^{- 1} U^{*} y, S_{r e g}^{- 1} = diag (\frac{σ_{i}}{σ_{i}^{1 + α_{i}} + λ}), \end{matrix}

(24)

where

0 < λ \leq \min_{i} {σ_{i}^{2}}

. If

α_{i} \geq {log}_{σ_{i}} (σ_{i}^{2} - λ) - 1

, the RSVD solution reduces noise sensitivity, satisfying:

∥ x_{R S V D} {- x ∥}_{2} \leq {∥ x_{S V D} - x ∥}_{2},

where

x_{S V D} = V Σ^{- 1} U^{*} y

is the standard SVD solution.

Proof.

The least-squares solution

x_{S V D}

using the Moore–Penrose pseudoinverse of

H

is

x_{S V D} = V S^{- 1} U^{*} y, S^{- 1} = diag (\frac{1}{σ_{i}}),

with error due to noise

n

:

e_{S V D} = x_{S V D} - x = V diag (\frac{1}{σ_{i}}) U^{*} n .

The RSVD solution is

x_{R S V D} = V S_{reg}^{- 1} U^{*} y,

with error:

e_{R S V D} = x_{R S V D} - x = V diag (\frac{σ_{i}}{σ_{i}^{1 + α_{i}} + λ}) U * n .

Therefore, the error norms are

∥ e_{S V D} ∥_{2} = \sqrt{\sum_{i = 1}^{N} {(\frac{u_{i}^{*} n_{i}}{σ_{i}})}^{2}}

, and

∥ e_{R S V D} ∥_{2} = \sqrt{\sum_{i = 1}^{N} {(\frac{u_{i}^{*} n_{i} σ_{i}}{σ_{i}^{1 + α_{i}} + λ})}^{2}} .

For

α_{i} \geq {log}_{σ_{i}} (σ_{i}^{2} - λ) - 1

,

σ_{i}^{1 + α_{i}} + λ \geq σ_{i}^{2} and \frac{σ_{i}}{σ_{i}^{1 + α_{i}} + λ} \leq \frac{1}{σ_{i}},

ensuring RSVD reduces noise amplification more effectively. □

Corollary 1.

In massive MIMO systems, where channel estimation errors are modeled as

H_{o b s} = \hat{U} \hat{S} \hat{V} *

(

\hat{S} = diag ({\tilde{σ}}_{1}, \dots, {\tilde{σ}}_{r})

), the RSVD solution achieves an error norm given by:

∥ e_{R S V D} ∥_{2} = \sqrt{\sum_{i = 1}^{N} {(\frac{{\tilde{σ}}_{i}}{{\tilde{σ}}_{i}^{1 + α_{i}} + λ})}^{2} {∥ {(\hat{U} * H_{e} x)}_{i} + {(\hat{U} * n)}_{i} ∥}^{2}} .

Under the condition

α_{i} > {log}_{{\tilde{σ}}_{i}} ({\tilde{σ}}_{i}^{2} - λ) - 1

, RSVD effectively reduces noise amplification and channel estimation errors, ensuring:

∥ e_{R S V D} ∥_{2} \leq {∥ e_{S V D} ∥}_{2} .

Theorem 1 and Corollary 1 show that RSVD improves standard SVD by adding a regularization term, enhancing stability and robustness. It limits noise amplification by modifying small singular values, balancing data fidelity, and noise suppression.

3.3. RSVD in Hybrid Beamforming

The previous subsection established a DL-ADMM framework for low-complexity hybrid beamforming design. However, when the observed channel

H_{obs}

contains estimation errors, even well-designed hybrid beamformers

{\hat{W}}_{B B}, {\hat{W}}_{R F}, {\hat{F}}_{R F}, {\hat{F}}_{B B}

cannot fully mitigate the mismatch between the actual and observed singular structures of the channel. As a result, the achievable spectral efficiency degrades, especially at high SNR where error amplification in the singular values becomes dominant [5,6,20]. To address this issue, we propose a Regularized Hybrid Beamforming mechanism based on RSVD.

Motivation

In standard SVD-based designs, the optimal precoders and combiners align with the largest singular modes of

H

. However, under imperfect CSI, the estimated singular values

{\tilde{σ}}_{i}

can deviate significantly from their true values. Small singular values in particular are highly sensitive to noise and estimation error, which can lead to severe mismatch in the effective channel

G

and hence reduced spectral efficiency. RSVD counteracts this problem by adding a bias (regularization term

λ

) that prevents small singular values from dominating the beamformer design. This approach is theoretically justified by Theorem 1 and Corollary 1, which establish improved conditioning and noise robustness.

Proposed mechanism

As shown in Figure 3, the proposed design introduces a regularization filter

R e g

between the data streams and the baseband precoder. The construction of

R e g

proceeds in the following steps:

Step 1: Initialization. Set $R e g = I_{N_{s}}$ as the initial filter. Compute the observed channel SVD, $H_{obs} = \hat{U} \hat{S} \hat{V} *$ , and design ${\hat{W}}_{B B}$ , ${\hat{W}}_{R F}$ , ${\hat{F}}_{R F}$ , and ${\hat{F}}_{B B}$ using DL-ADMM under imperfect CSI. At this stage, the hybrid beamformers approximate the unconstrained SVD solution, but are still vulnerable to error-induced spectral efficiency loss.
Step 2: Regularization parameter design. Following (10), Theorem 1, and Corollary 1, we enforce a robustness inequality that guarantees improved performance:

$\begin{matrix} ∥ {\bar{G}}_{1} R e g^{2} {\bar{G}}_{2} ∥_{\infty} \geq ∥ {\bar{S}}^{2} ∥_{\infty} + P > {∥ {\bar{S}}^{2} ∥}_{\infty}, \end{matrix}$

(25)

where $\bar{S} = \hat{S} (1 : N_{s}, 1 : N_{s})$ is the truncated singular value matrix, ${\bar{G}}_{1} = H_{obs} {\hat{F}}_{R F} {\hat{F}}_{B B}$ and ${\bar{G}}_{2} = {\hat{W}}_{B B} * {\hat{W}}_{R F} * H_{obs} {\hat{F}}_{R F} {\hat{F}}_{B B}$ capture the effective channel components, and $P > 0$ is a penalty term that can be chosen as a function of $f (∥ {\bar{G}}_{1} {\bar{G}}_{2} - {\bar{S}}^{2} ∥_{F})$ . The use of the ∞-norm ensures that the dominant channel gains are reinforced, guaranteeing that the regularized design does not underperform compared to the unregularized case.

From (25), we derive the adjustment factor

α_{i}

for each singular mode as

\begin{matrix} α_{i} = {log}_{{\tilde{σ}}_{i}} (\frac{\sqrt{{max}_{i} ({\tilde{σ}}_{i}^{2}) + P}}{{\tilde{σ}}_{i}} - λ / \sqrt{∥ {\bar{G}}_{1} ∥_{\infty} {∥ {\bar{G}}_{2} ∥}_{\infty}}) - 1, \end{matrix}

(26)

where

λ

is a tunable regularization constant. The parameter

α_{i}

ensures that the contribution of each singular mode is properly balanced under CSI uncertainty.

Step 3: Construction of $R e g$ . Using the $α_{i}$ values and $λ$ , define the diagonal regularization filter as

$\begin{matrix} R e g = diag (\frac{σ_{1}^{1 + α_{1}} + λ}{σ_{1}}, \dots, \frac{σ_{N_{s}}^{1 + α_{N_{s}}} + λ}{σ_{N_{s}}}) . \end{matrix}$

(27)

This filter amplifies strong singular values while preventing weak and error-prone ones from dominating the effective channel.

Resulting spectral efficiency

The new effective channel becomes

\overset{˘}{G} = \hat{W} * H \hat{F} R e g,

(28)

leading to the regularized spectral efficiency

\begin{matrix} \overset{˘}{R} = {log}_{2} (\det (I_{N_{s}} + \frac{ϱ}{N_{s}} {\hat{R}}_{n}^{- 1} \overset{˘}{G} \overset{˘}{G} *)), \end{matrix}

(29)

with

{\hat{R}}_{n} = σ^{2} \hat{W} * \hat{W}

. This value

\overset{˘}{R}

represents an enhanced achievable rate that is robust to estimation errors and always outperforms (or at least matches) the unregularized mismatch rate

\hat{R}

.

Based on extensive numerical experiments under varying SNR and CSI-error levels, we select the RSVD-related hyperparameters using a simple stability–complexity rule and keep them fixed across all tests unless otherwise stated. In practice, RSVD is applied to the estimated effective channel to obtain a compact low-rank spectral representation, and regularization is then introduced by softly damping the weakest (most noise-sensitive) spectral components before they are used in the subsequent beamformer updates. This “spectrum-aware” damping is implemented as a diagonal stabilizer that reduces the influence of ill-conditioned modes, thereby preventing error amplification caused by imperfect CSI while preserving the dominant signal subspace.

While ADMM unfolding and singular-value regularization are each well established in the literature, the novelty of our approach lies in their coupled co-design for hybrid beamforming under imperfect CSI. Specifically, the proposed DL–ADMM unrolling does not merely replace iterations with layers; it learns layer-wise ADMM quantities (e.g., penalty/step parameters and update mappings) to efficiently estimate the hybrid beamforming factors

(F_{RF}, F_{BB}, W_{RF}, W_{BB})

in an amortized manner. These learned beamformers are then explicitly integrated with an RSVD-style stabilization stage that constructs a mode-adaptive diagonal filter,

Reg

, to prevent noise and CSI mismatch from being amplified by ill-conditioned singular modes. Unlike prior works that apply SVD/RSVD as a standalone post-processing step, our stabilization is tied to the learned effective channel through the robustness constraint and the resulting mode-wise adjustment

{ε_{i}}

, and it is further aligned with learning via a singular-structure preservation term in the training objective. Therefore, the two components do not form a simple sequential stack: the unfolded optimizer shapes the effective channel on which the RSVD stabilization operates, and the stabilization in turn suppresses weak, error-sensitive modes that would otherwise degrade the unfolded solution under estimation mismatch. This joint design yields a unified architecture that is simultaneously computationally efficient and robust by construction.

Note: The proposed DL-ADMM is trained and evaluated on channel realizations generated from the clustered geometric mmWave/sub-THz model in (2), which implies an angular-domain sparse propagation structure (few dominant paths). Imperfect CSI is incorporated during both training and testing by forming the observed channel as

H_{obs} = H + H_{e}

, where

vec (H_{e}) \sim CN (0, σ_{e}^{2} I)

(cf. (6)). Since the network takes

H_{obs}

as input and outputs hybrid precoders/combiners, the same trained model is applied at inference across the full operating SNR range considered in the following section; as SNR increases, the impact of CSI mismatch becomes more pronounced and the RSVD component yields larger robustness gains by suppressing error-sensitive singular modes, whereas at low SNR the performance gap naturally narrows in the noise-limited regime. Finally, the framework is not tied to a single channel model: if propagation deviates from sparse clustered conditions (e.g., richer scattering with larger

N_{c l} N_{r a y}

, different angular spreads, or Rayleigh/Rician fading), the same unfolded architecture can be retrained/fine-tuned using samples drawn from the target channel distribution, enabling a controlled assessment of performance shifts under alternative channel statistics without modifying the proposed design.

4. Simulation Results

This section presents simulation results to evaluate the efficiency of the proposed DL-ADMM algorithm (Algorithm 1) and the regularized hybrid beamforming scheme called DL-ADMM-reg. We consider transmitter and receiver arrays of sizes

(N_{t} = 32, N_{r} = 32)

and

(N_{t} = 64, N_{r} = 64)

. The channel follows a clustered model with

N_{cl} = 4

clusters and

N_{ray} = 10

rays per cluster. The angular spread is set to

10^{°}

, and the average transmit power is normalized to

ϱ = 1

. A total of 2000 channels are generated, equally divided into training and testing sets. The DL-ADMM framework is implemented in TensorFlow, using a depth of five layers. Training is performed for 1000 epochs with a learning rate of 0.01 and default exponential decay. The activation

Ψ (\cdot)

enforces the Frobenius-norm constraint, while

ρ^{(k)}

and

Ω^{(k)}

are initialized randomly and updated through gradient descent. Baseline methods include OMP [10], APS [14], and two benchmarks: Optimal (digital-only unconstrained precoding with perfect CSI) and Mismatch (digital-only unconstrained precoding under imperfect CSI).

To make the proposed RSVD-based regularization reproducible, we specify how the key parameters are set in practice. For the penalty parameter P in the RSVD filter, we select

P = 2.6 ∥ {\bar{G}}_{1} {\bar{G}}_{2} - {\bar{S}}^{2} ∥_{F}

for the

32 \times 32

array and

P = 2 ∥ {\bar{G}}_{1} {\bar{G}}_{2} - {\bar{S}}^{2} ∥_{F}

for the

64 \times 64

case; these values were determined empirically via numerical experiments to balance robustness (stability under CSI uncertainty) and efficiency (avoiding over-regularization). Moreover, we use

λ = \min_{i} {σ_{i}^{2}}

for both settings to anchor the shrinkage to the weakest singular mode, which prevents ill-conditioned components from dominating the update, and we set

α_{i} = {log}_{σ_{i}} (σ_{i}^{2} - λ) - 1

to adapt the mode-wise attenuation according to the relative strength of each singular component. In our experiments, keeping the same

λ

and

α_{i}

rule across both array sizes yielded stable behavior without case-by-case tuning, while only P was mildly adjusted to reflect the change in problem scale.

To avoid attributing improvements to particular baseline configuration choices, we emphasize that the simulation settings above are adopted as representative mmWave/XL-MIMO test conditions and are not inherently tailored to favor the proposed method. The same channel model, dataset split, training protocol, and computational budget are applied consistently to all learning-based components, and all competing baselines are implemented under their standard assumptions and identical CSI conditions to ensure a fair comparison. Importantly, the proposed DL-ADMM and DL-ADMM-reg are not tied to the specific array sizes, clustered parameters, or training hyperparameters reported here; the same design and evaluation pipeline can be applied without modification to other antenna dimensions, propagation environments, or SNR/CSI-uncertainty regimes. Therefore, the reported gains should be interpreted as stemming from the proposed model-driven unfolding and explicit regularization mechanism rather than from any particular choice of the illustrative simulation configuration.

4.1. Convergence Analysis

To further assess the stability of the proposed DL-ADMM framework, we analyze the convergence behavior of the training and validation losses for both antenna configurations. Figure 4 illustrates the evolution of the loss function over five epochs for (a)

(32 \times 32)

and (b)

(64 \times 64)

antenna arrays. Each epoch consists of 200 iterations, meaning that the full training process involves only 1000 iterations in total. This setting demonstrates the efficiency of the proposed method, as convergence is achieved with a relatively small number of updates compared to conventional iterative optimization algorithms.

In both cases, the losses decrease sharply within the first two epochs (i.e., within the first 400 iterations) and gradually stabilize thereafter, indicating rapid convergence of the unfolded network. An important observation is that the training and validation loss curves remain consistently close throughout the training process. This demonstrates that the model generalizes well to unseen data and avoids the common problem of overfitting. Specifically, in the

(32 \times 32)

case, both losses converge to values around 7–8, while in the

(64 \times 64)

case they converge around 13–14. The slightly higher residual loss in the larger array is expected due to the increased number of optimization variables, but the overall convergence trend remains stable. These results confirm that the DL-ADMM architecture not only achieves low-complexity optimization but also provides reliable convergence under both small- and large-scale antenna settings, ensuring robustness in practical hybrid beamforming deployments.

4.2. Spectral Efficiency vs. SNR

Figure 5 illustrates the spectral efficiency performance of the proposed DL-ADMM-Reg scheme compared with APS, OMP, conventional ADMM, and benchmark curves (Optimal and Mismatch) under an estimation error variance of

σ_{e}^{2} = 0.1

. Across the entire SNR range, DL-ADMM-Reg exhibits performance very close to the Optimal curve, consistently outperforming APS, OMP, and conventional ADMM. The improvement over the Mismatch baseline highlights the effectiveness of the RSVD filter in mitigating singular value distortion caused by channel estimation errors. Compared to conventional ADMM, the DL-ADMM-Reg approach achieves higher robustness and stability, benefiting from its data-driven initialization and learned parameter tuning, which accelerate convergence and reduce sensitivity to CSI imperfections. APS achieves competitive performance compared to OMP, benefiting from its adaptive clustering approach, but it still lags behind ADMM-based schemes due to its reliance on centroid-based precoding. OMP shows the lowest spectral efficiency, as its reliance on predefined codebooks reduces adaptability under imperfect CSI. Comparing subfigures (a) and (b), the larger

64 \times 64

array in Figure 5b delivers higher spectral efficiency than the

32 \times 32

case in Figure 5a. This is a direct result of the additional spatial degrees of freedom, which enhance multiplexing and beamforming gain. Importantly, DL-ADMM-Reg demonstrates strong scalability with antenna size, retaining stability and robustness across different configurations.

To complement the spectral-efficiency curves, we report the average relative error of each hybrid beamforming method with respect to the Optimal (digital-only) benchmark, where a smaller value indicates a closer match to the unconstrained optimum under the same channel realizations. As shown in Table 1, DL-ADMM-Reg yields the lowest error among the conventional baselines (ADMM, APS, and OMP), demonstrating that the proposed RSVD-stabilized unfolded design more reliably preserves the dominant eigenmodes of the effective channel and reduces mismatch-induced degradation. Compared with recent DL baselines [18,19], DL-ADMM-Reg achieves similarly low error while offering stronger consistency across antenna dimensions; notably, for the larger

(64, 64)

array the error remains very small, highlighting that the proposed model-driven unfolding coupled with explicit regularization scales favorably and maintains proximity to the optimal performance.

4.3. Spectral Efficiency vs. Channel Estimation Error

Figure 6 presents spectral efficiency as a function of the channel estimation error variance

σ_{e}^{2}

at fixed

S N R = 10

dB. Both antenna configurations are considered. The Optimal curve provides the theoretical upper bound, while the Mismatch curve highlights the worst-case performance when imperfect CSI is directly used without compensation. The results show that DL-ADMM-Reg consistently delivers higher spectral efficiency across all error levels. In the

32 \times 32

configuration, spectral efficiency decreases rapidly as

σ_{e}^{2}

increases, reflecting limited robustness of smaller arrays. In the

64 \times 64

configuration, however, the degradation is less severe: the larger number of antennas increases spatial diversity and reduces sensitivity to estimation errors. This demonstrates that the proposed method not only provides robust compensation for CSI errors but also leverages the inherent robustness of large-scale antenna systems.

In conclusion, Figure 5 and Figure 6 provides a clear quantitative validation of the proposed scheme. As the channel estimation error increases, the spectral efficiency of OMP and APS degrades rapidly because their beam selection/phase updates are highly sensitive to error-distorted singular subspaces, whereas the proposed DL-ADMM-RSVD maintains a substantially higher rate by jointly optimizing RF/baseband variables and explicitly regularizing vulnerable singular modes. For example, at

σ_{e}^{2} = 0.1

, DL-ADMM-RSVD achieves an approximately 8–11 bits/s/Hz gain over OMP in the tested SNR range (20–40 dB), while remaining within <1 bit/s/Hz of the digital-optimal benchmark, demonstrating both robustness and near-optimality under imperfect CSI. This consistent gap across error levels confirms that the advantage is not anecdotal but stems from the proposed model-driven unfolding and RSVD-based conditioning control.

4.4. Error Analysis

An essential aspect of evaluating hybrid beamforming performance under imperfect CSI is understanding the statistical distribution of channel estimation errors. To this end, we investigate the error distribution for two antenna configurations,

(32 \times 32)

and

(64 \times 64)

. Figure 7 illustrates both histograms and empirical probability density functions (PDFs) of the estimation errors. The histograms (left column) show the raw occurrence of error samples, while the smoothed PDFs (right column), obtained using kernel density estimation, reveal that the errors closely follow a Gaussian distribution centered at zero. This confirms the common assumption in the massive MIMO literature that channel estimation errors can be modeled as normally distributed random variables. For the

(32 \times 32)

configuration, the error variance is relatively small, resulting in a narrow Gaussian spread around the mean. In contrast, for the

(64 \times 64)

configuration, the distribution is wider, reflecting a higher variance due to the increased number of antennas and associated estimation complexity. Despite the difference in variance, both cases align well with the Gaussian assumption, validating its suitability for error modeling in the proposed framework. The Gaussian nature of estimation errors is particularly relevant for the proposed Regularized Hybrid Beamforming, as the RSVD-based filter is designed to stabilize the impact of normally distributed noise on the singular values of the observed channel matrix. By confirming normality in the error distribution, these results reinforce the effectiveness of the proposed design.

4.5. Analysis of Complexity

To better understand the computational efficiency of the proposed DL-ADMM algorithm, we now analyze its per-iteration complexity and compare it with conventional hybrid beamforming methods. In Algorithm 1, the main computational cost arises in lines 4–6, where the term

R B

is repeatedly recalculated for the updates of

Z

,

R

, and

B

. Assuming

R \in C^{a \times b}

and

B \in C^{b \times c}

, the product

R B

has dimension

a \times c

and requires

O (a b c)

operations. Since this computation appears three times per iteration, the total dominant cost is

O (3 a b c)

. This makes the DL-ADMM scheme both structured and computationally efficient. To relate

(a, b, c)

to the hybrid beamforming problem, note that a corresponds to the number of antennas (

N_{t}

or

N_{r}

), b corresponds to the number of RF chains (

N_{R F t}

or

N_{R F r}

), and c corresponds to the number of data streams (

N_{s}

). Thus, the complexity of the DL-ADMM algorithm per iteration can be expressed as

O (3 N_{t (r)} N_{R F t (r)} N_{s})

, where

N_{t (r)}

and

N_{R F t (r)}

refer to the transmitter (or receiver) dimensions depending on whether precoders or combiners are being computed.

Figure 8 illustrates the practical computational burden of the compared algorithms under the parameter settings of Table 2. As seen, OMP incurs the highest cost, scaling rapidly with antenna size due to its quadratic dependence on

N_{t (r)}

. APS exhibits moderate complexity, but its reliance on clustering and iterative updates (

M = 10

) makes it more expensive than ADMM-based schemes. Conventional ADMM and DL-ADMM have nearly identical theoretical complexity; however, DL-ADMM avoids iterative convergence checks by unfolding ADMM steps into a fixed-depth network, leading to reduced runtime overhead. Importantly, DL-ADMM achieves a balance between low computational complexity and improved robustness, making it well-suited for large-scale massive MIMO under imperfect CSI.

The results reveal a clear performance–complexity trade-off when comparing DL-ADMM-RSVD with the benchmark beamforming methods. Greedy/codebook-based schemes such as OMP typically incur lower implementation overhead but suffer a pronounced spectral-efficiency loss under imperfect CSI, whereas fully digital or high-precision iterative solutions can approach near-optimal performance at the cost of substantially higher per-iteration matrix operations and slower runtime scaling with

(N_{t}, N_{r})

. In contrast, DL-ADMM-RSVD achieves near-digital performance while keeping complexity predictable and moderate: it runs a fixed number of unfolded layers (iterations) with structured ADMM updates and simple projection steps, and the RSVD regularization adds only a one-time SVD-type operation whose cost is amortized across the fixed-depth network. Quantitatively, for the same antenna/RF-chain settings, our method requires far fewer iterative search steps than OMP and avoids the repeated greedy atom selection, while delivering significantly higher spectral efficiency; compared with conventional iterative optimizers, it reduces the number of required iterations by replacing convergence-driven stopping with a fixed-depth forward pass. These comparisons show that the proposed approach offers a favorable operating point—substantial robustness and spectral-efficiency gains for a controlled and scalable computational budget.

5. Conclusions

In this paper, we proposed a novel hybrid beamforming mechanism for massive MIMO systems to explicitly address the challenges posed by channel estimation errors. Our approach integrates a low-complexity DL-based ADMM framework withn a RSVD strategy, enabling simultaneous optimization of both baseband and RF chain components. Unlike conventional OMP or APS schemes, which rely on predefined codebooks and are highly sensitive to CSI imperfections, the proposed DL-ADMM design learns adaptive mappings that significantly reduce computational complexity while maintaining near-optimal performance. To further enhance robustness, we introduced a regularization filter,

R e g

, that strengthens the singular values of the observed channel. This addition mitigates the mismatch effects caused by estimation errors, providing theoretical guarantees of improved stability and noise resilience. Simulation results demonstrated that the proposed DL-ADMM-Reg method consistently outperforms conventional baselines and the mismatch benchmark across a wide range of SNR values and channel estimation error variances. Notably, the method achieves spectral efficiency close to the optimal bound, even under moderate-to-severe CSI imperfections, and exhibits improved scalability with larger antenna arrays. Overall, the combination of DL-based unfolding for complexity reduction and RSVD-based regularization for robustness presents a promising direction for practical hybrid beamforming in beyond-5G and 6G systems. Future research could extend this framework to multi-user MIMO scenarios, explore its application to wideband mmWave and THz channels, and investigate hardware implementation aspects to further validate its real-time feasibility.

Author Contributions

S.P.A.: Conceptualization, software implementation, methodology design, writing—original draft preparation, validation, and manuscript editing. A.N.: Methodology development, software implementation, writing, review, data collection, and editing. S.-C.C.: Conceptualization, writing, validation, and manuscript editing. R.-H.L.: Conceptualization, methodology, supervision, funding, writing—original draft preparation, and manuscript editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Larsson, E.G.; Edfors, O.; Tufvesson, F.; Marzetta, T.L. Massive MIMO for next generation wireless systems. IEEE Commun. Mag. 2014, 52, 186–195. [Google Scholar] [CrossRef]
Han, S.; Chih-Lin, I.; Xu, Z.; Rowell, C. Large-scale antenna systems with hybrid analog and digital beamforming for millimeter wave 5G. IEEE Commun. Mag. 2015, 53, 186–194. [Google Scholar] [CrossRef]
Rappaport, T.S.; Xing, Y.; Kanhere, O.; Ju, S.; Madanayake, A.; Mandal, S.; Alkhateeb, A.; Trichopoulos, G.C. Wireless communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond. IEEE Access 2019, 7, 78729–78757. [Google Scholar] [CrossRef]
Wang, Y.; Chen, X.; Cai, Y.; Champagne, B.; Hanzo, L. Channel estimation for hybrid massive MIMO systems with adaptive-resolution ADCs. IEEE Trans. Commun. 2022, 70, 2131–2146. [Google Scholar] [CrossRef]
Myers, N.J.; Pachai Kannu, A. Impact of Channel Estimation Errors on Single Stream MIMO Beamforming. IEEE Commun. Lett. 2017, 21, 1345–1348. [Google Scholar] [CrossRef]
Chen, Y.; Wen, X.; Lu, Z. Achievable Spectral Efficiency of Hybrid Beamforming Massive MIMO Systems With Quantized Phase Shifters, Channel Non-Reciprocity and Estimation Errors. IEEE Access 2020, 8, 71304–71317. [Google Scholar] [CrossRef]
Heath, R.W.; Gonzalez-Prelcic, N.; Rangan, S.; Roh, W.; Sayeed, A.M. An overview of signal processing techniques for millimeter wave MIMO systems. IEEE J. Sel. Top. Signal Process. 2016, 10, 436–453. [Google Scholar] [CrossRef]
Mendez-Rial, R.; Rusu, C.; González-Prelcic, N.; Heath, R.W. Dictionary-free hybrid precoders and combiners for mmWave MIMO systems. In Proceedings of the 2015 IEEE 16th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Stockholm, Sweden, 28 June–1 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 151–155. [Google Scholar]
Wang, Y.; Zou, W. Low complexity hybrid precoder design for millimeter wave MIMO systems. IEEE Commun. Lett. 2019, 23, 1259–1262. [Google Scholar] [CrossRef]
El Ayach, O.; Rajagopal, S.; Abu-Surra, S.; Pi, Z.; Heath, R.W. Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans. Wirel. Commun. 2014, 13, 1499–1513. [Google Scholar] [CrossRef]
Zhao, L.; Luo, Z.; Liu, H.; Pu, X.; Kuang, Q. MmWave relay systems with robust hybrid transceiver designs under correlated channel estimation errors. Digit. Signal Process. 2022, 127, 103541. [Google Scholar] [CrossRef]
Sohrabi, F.; Yu, W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Top. Signal Process. 2016, 10, 501–513. [Google Scholar] [CrossRef]
Wang, Y.; Zou, W. Low complexity hybrid precoding algorithm in millimeter wave mimo systems. In Proceedings of the 2018 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Beijing, China, 16–18 August 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 16–20. [Google Scholar]
Alouzi, M.; Yanikomeroglu, H.; Karabulut Kurt, G. Adaptive Phase Shifters for Hybrid Beamforming in mmWave Systems. IEEE Trans. Wirel. Commun. 2025, 24, 1104–1116. [Google Scholar] [CrossRef]
Huang, H.; Yang, J.; Huang, H.; Song, Y.; Gui, G. Deep learning for super-resolution channel estimation and DOA estimation based massive MIMO system. IEEE Trans. Veh. Technol. 2018, 67, 8549–8560. [Google Scholar] [CrossRef]
Lavdas, S.; Gkonis, P.K.; Zinonos, Z.; Trakadas, P.; Sarakis, L.; Papadopoulos, K. A machine learning adaptive beamforming framework for 5G millimeter wave massive MIMO multicellular networks. IEEE Access 2022, 10, 91597–91609. [Google Scholar] [CrossRef]
Iliadis, L.A.; Zaharis, Z.D.; Sotiroudis, S.; Sarigiannidis, P.; Karagiannidis, G.K.; Goudos, S.K. The road to 6G: A comprehensive survey of deep learning applications in cell-free massive MIMO communications systems. EURASIP J. Wirel. Commun. Netw. 2022, 2022, 68. [Google Scholar] [CrossRef]
Chen, J.; Tao, J.; Luo, S.; Li, S.; Zhang, C.; Xiang, W. A deep learning driven hybrid beamforming method for millimeter wave MIMO system. Digit. Commun. Netw. 2023, 9, 1291–1300. [Google Scholar] [CrossRef]
Jeyakumar, P.; Ramesh, A.; Srinitha, S.; Nishant, V.; Gowri, P.; Muthuchidambaranathan, P. Two-stage deep learning-based hybrid precoder design for very large scale massive MIMO systems. Phys. Commun. 2022, 54, 101835. [Google Scholar]
Zheng, S.; Ding, C.; Nie, F. Regularized singular value decomposition and application to recommender system. arXiv 2018, arXiv:1804.05090. [Google Scholar] [CrossRef]
Nie, F.; Huang, H.; Cai, X.; Ding, C. Efficient and robust feature selection via joint ℓ², 1-norms minimization. Adv. Neural Inf. Process. Syst. 2010, 23, 1813–1821. [Google Scholar]
Peken, T.; Adiga, S.; Tandon, R.; Bose, T. Deep learning for SVD and hybrid beamforming. IEEE Trans. Wirel. Commun. 2020, 19, 6621–6642. [Google Scholar] [CrossRef]
Elbir, A.M. A deep learning framework for hybrid beamforming without instantaneous CSI feedback. IEEE Trans. Veh. Technol. 2020, 69, 11743–11755. [Google Scholar] [CrossRef]
Hojatian, H.; Nadal, J.; Frigon, J.F.; Leduc-Primeau, F. Unsupervised deep learning for massive MIMO hybrid beamforming. IEEE Trans. Wirel. Commun. 2021, 20, 7086–7099. [Google Scholar] [CrossRef]
Boyd, S.; Parikh, N.; Chu, E.; Peleato, B.; Eckstein, J. Distributed optimization and statistical learning via the alternating direction method of multipliers. Found. Trends Mach. Learn. 2011, 3, 1–122. [Google Scholar] [CrossRef]
Rajput, K.P.; Maity, P.; Srivastava, S.; Sharma, V.; Venkategowda, N.K.D.; Jagannatham, A.K.; Hanzo, L. Robust Linear Hybrid Beamforming Designs Relying on Imperfect CSI in mmWave MIMO IoT Networks. arXiv 2022, arXiv:2212.12917. [Google Scholar] [CrossRef]
Chen, Z.; Li, Y.; Zhang, R.; Liu, R.; Niu, Z.; Li, Q. Robust Hybrid Beamforming Design for Multi-RIS Assisted Millimeter Wave Systems With Imperfect CSI. IEEE Trans. Wirel. Commun. 2023, 22, 2640–2654. [Google Scholar] [CrossRef]
Shi, S.; Cai, Y.; Hu, Q.; Champagne, B.; Hanzo, L. Deep-unfolding neural-network aided hybrid beamforming based on symbol-error probability minimization. IEEE Trans. Veh. Technol. 2022, 72, 529–545. [Google Scholar] [CrossRef]

Figure 1. Hybrid beamforming architecture.

Figure 2. DL-ADMM network architecture.

Figure 3. Regularized hybrid beamforming architecture.

Figure 4. Convergence behavior of the proposed DL-ADMM framework. Training and validation losses for (a)

(32 \times 32)

and (b)

(64 \times 64)

antenna configurations.

Figure 4. Convergence behavior of the proposed DL-ADMM framework. Training and validation losses for (a)

(32 \times 32)

and (b)

(64 \times 64)

antenna configurations.

Figure 5. Spectral efficiency vs. SNR for

(N_{R F t}, N_{R F r}) = (4, 4)

,

N_{s} = 4

, and configurations: (a)

(N_{t}, N_{r}) = (32, 32)

, (b)

(N_{t}, N_{r}) = (64, 64)

, with channel estimation error variance

σ_{e}^{2} = 0.1

.

Figure 5. Spectral efficiency vs. SNR for

(N_{R F t}, N_{R F r}) = (4, 4)

,

N_{s} = 4

, and configurations: (a)

(N_{t}, N_{r}) = (32, 32)

, (b)

(N_{t}, N_{r}) = (64, 64)

, with channel estimation error variance

σ_{e}^{2} = 0.1

.

Figure 6. Spectral efficiency vs. channel estimation error variance for

(N_{R F t}, N_{R F r}) = (4, 4)

,

N_{s} = 4

, and antenna configurations: (a)

(N_{t}, N_{r}) = (32, 32)

, (b)

(N_{t}, N_{r}) = (64, 64)

, at

S N R = 10

dB.

Figure 6. Spectral efficiency vs. channel estimation error variance for

(N_{R F t}, N_{R F r}) = (4, 4)

,

N_{s} = 4

, and antenna configurations: (a)

(N_{t}, N_{r}) = (32, 32)

, (b)

(N_{t}, N_{r}) = (64, 64)

, at

S N R = 10

dB.

Figure 7. Error analysis for channel estimation: histograms (left) and empirical PDFs (right) for

(32 \times 32)

and

(64 \times 64)

antenna configurations.

Figure 7. Error analysis for channel estimation: histograms (left) and empirical PDFs (right) for

(32 \times 32)

and

(64 \times 64)

antenna configurations.

Figure 8. Complexity comparison of hybrid beamforming algorithms under two antenna configurations: (32 × 32) and (64 × 64), with

N_{R F t} = N_{R F r} = 4

and

N_{s} = 4

. Complexity is shown in log scale.

Figure 8. Complexity comparison of hybrid beamforming algorithms under two antenna configurations: (32 × 32) and (64 × 64), with

N_{R F t} = N_{R F r} = 4

and

N_{s} = 4

. Complexity is shown in log scale.

Table 1. Average relative error of hybrid beamforming methods with respect to the Optimal (digital-only) benchmark.

$(N_{t}, N_{r})$	ADMM	APS	OMP	DL-ADMM-Reg	[18]	[19]
$(32, 32)$	0.10	0.05	0.30	0.02	0.01	0.02
$(64, 64)$	0.08	0.05	0.31	0.004	0.005	0.01

Table 2. Complexities of different algorithms.

Algorithm	Dominant Complexity (per Iteration)
ADMM	$O (3 N_{t (r)} N_{R F t (r)} N_{s} * (n u m b e r o f i t e r a t i o n s))$
OMP	$O (N_{t (r)}^{2} N_{R F t (r)} N_{s})$ [13]
APS	$O (2 N_{t (r)} N_{R F t (r)} N_{c l} M)$ , ( $M = 10 (n u m b e r o f i t e r a t i o n s)$ ) [14].
DL-ADMM	$O (3 N_{t (r)} N_{R F t (r)} N_{s} * (n u m b e r o f l a y e r s))$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Azizi, S.P.; Nafei, A.; Chen, S.-C.; Lin, R.-H. Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information. Mathematics 2026, 14, 509. https://doi.org/10.3390/math14030509

AMA Style

Azizi SP, Nafei A, Chen S-C, Lin R-H. Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information. Mathematics. 2026; 14(3):509. https://doi.org/10.3390/math14030509

Chicago/Turabian Style

Azizi, S. Pourmohammad, Amirhossein Nafei, Shu-Chuan Chen, and Rong-Ho Lin. 2026. "Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information" Mathematics 14, no. 3: 509. https://doi.org/10.3390/math14030509

APA Style

Azizi, S. P., Nafei, A., Chen, S.-C., & Lin, R.-H. (2026). Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information. Mathematics, 14(3), 509. https://doi.org/10.3390/math14030509

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning-Enhanced Hybrid Beamforming Design with Regularized SVD Under Imperfect Channel Information

Abstract

1. Introduction

1.1. Deep Learning-Based Hybrid Beamforming

1.2. Motivation and Contribution

2. System Model and Spectral Efficiency

2.1. System Model

2.2. Achievable Spectral Efficiency

2.3. Conventional ADMM for Hybrid Beamforming

3. DL-ADMM Assistant Regularized Hybrid Beamforming

3.1. DL-ADMM Scheme

3.2. Regularized System

3.3. RSVD in Hybrid Beamforming

4. Simulation Results

4.1. Convergence Analysis

4.2. Spectral Efficiency vs. SNR

4.3. Spectral Efficiency vs. Channel Estimation Error

4.4. Error Analysis

4.5. Analysis of Complexity

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI