Next Article in Journal
A 0.3 V Ultra-Low-Power Bulk-Driven Current-Reuse OTA for Batteryless Applications
Previous Article in Journal
Hardware-Based Reduction of Submodule Capacitor Voltage Ripple in Modular MultiLevel Converters: A Critical Review
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Deep Learning for Joint Pilot, Channel Feedback and Sub-Array Hybrid Beamforming in FDD Massive MU-MIMO-OFDM Systems

1
Science and Technology on Micro-System Laboratory, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai 200050, China
2
School of Electronic, Electrical and Communication, University of Chinese Academy of Sciences, Beijing 100049, China
*
Author to whom correspondence should be addressed.
Electronics 2026, 15(6), 1255; https://doi.org/10.3390/electronics15061255
Submission received: 11 February 2026 / Revised: 10 March 2026 / Accepted: 16 March 2026 / Published: 17 March 2026
(This article belongs to the Section Microwave and Wireless Communications)

Abstract

In frequency division duplex (FDD) massive multiple-input multiple-output orthogonal frequency division multiplexing (MIMO-OFDM) systems, the sub-array multi-user (MU) hybrid beamforming architecture is highly attractive because of its low hardware cost and high energy efficiency. However, downlink channel state information (CSI) acquisition and hybrid beamformer optimization remain challenging due to the large feedback overhead and the non-convexity of the beamforming design. To address these issues, we propose an end-to-end deep learning (DL) framework that jointly optimizes pilot training, CSI feedback, and hybrid beamforming, overcoming the limitations of conventional independently designed modules. At the core of the network, we introduce the star efficient location attention (StarELA) module, which combines the implicit high-dimensional representation capability of star operations (element-wise multiplication) with the fine-grained feature localization of efficient location attention (ELA). In addition, for wideband digital beamformer generation, we exploit inter-subcarrier correlation and design a frequency–domain seed generation and interpolation upsampling strategy, which significantly reduces network parameters. Experimental results show that the proposed method approaches the upper-bound performance of conventional hybrid beamforming with ideal CSI, while consistently outperforming existing benchmark methods.

1. Introduction

Following the global commercial deployment of 5G networks, research on the vision and enabling technologies of sixth-generation mobile communications (6G) has been fully initiated [1]. As cornerstones for enhancing spectral efficiency and system capacity, massive multiple-input multiple-output (MIMO) and the emerging extremely large-scale MIMO (XL-MIMO) [2] will play a central role in 6G evolution. By deploying large-scale antenna arrays at the base station (BS), MIMO systems can significantly boost throughput and suppress multi-user interference through spatial multiplexing and beamforming. However, realizing these benefits relies on the availability of accurate channel state information (CSI) at the BS. Specifically for downlink transmission, the BS requires precise CSI to compute optimal beamforming matrices to focus signal energy on target user equipment (UE), thereby achieving superior system performance.
In time division duplex (TDD) systems, the BS can leverage channel reciprocity to acquire downlink CSI from uplink pilots. However, in frequency division duplex (FDD) systems, such reciprocity is absent as uplink and downlink operate on different frequency bands. Consequently, UEs must estimate downlink CSI from pilots transmitted by the BS and feed it back for beamforming. With the massive scaling of antenna arrays, the associated feedback overhead grows prohibitively, posing a severe bottleneck for FDD massive MIMO systems [3].
Despite these overhead challenges, the dual-band nature of FDD architectures also opens up novel possibilities for multi-band fusion techniques. Recent studies have demonstrated that fusing uplink and downlink communication signals across different FDD bands can significantly enhance target sensing capabilities in integrated sensing and communication (ISAC) systems [4].
In addition, hardware cost and power consumption are also pivotal factors in practical deployment. Fully-connected digital beamforming, which requires one radio frequency (RF) chain per antenna, is often cost-prohibitive, especially in mmWave massive MIMO systems. To address this, the sub-array hybrid beamforming architecture [5,6] illustrated in Figure 1 offers a promising alternative by connecting each antenna to a single RF chain, which significantly reduces the required number of phase shifters (PSs). However, this architecture introduces rigorous design constraints [5]: the analog beamforming matrix must satisfy constant modulus and specific antenna connectivity requirements. Consequently, optimizing the digital beamformer becomes a non-convex problem that is difficult to solve using traditional methods.

1.1. Related Works

To address CSI feedback and beamforming challenges in FDD systems, research has evolved from conventional separated designs to deep learning-based approaches. In traditional communication frameworks, pilot design, channel estimation, feedback, and beamforming are optimized as independent modules. Early studies mainly relied on compressed sensing (CS) algorithms [7] to exploit channel sparsity in the angular or delay domain for CSI estimation; the resulting CSI was then fed back via codebook-based quantization or CS-based compression. Based on reconstructed CSI, the BS designed beamformers using zero-forcing (ZF), minimum mean-square error (MMSE) [8], and methods based on successive interference cancellation (SIC) [9] or alternating minimization (AltMin) [5].
Despite their effectiveness, these methods have apparent limitations. CS-based schemes depend strongly on sparsity assumptions and can degrade in rich-scattering environments. Codebook-based feedback faces a fundamental trade-off between overhead and accuracy. AltMin-based beamforming can approach fully digital performance, but its iterative optimization incurs high computational complexity, limiting real-time applicability in low-latency scenarios.
In recent years, deep learning (DL) has shown strong potential for wireless communications. CsiNet [10] modeled CSI as an image and introduced an autoencoder-based feedback network, significantly outperforming conventional CS baselines. Subsequent studies further improved feedback accuracy [11,12,13,14,15]. CRNet [11] introduced multi-resolution feature extraction, CsiNet+ [12] adopted deeper architectures, and TransNet [13] employed Transformer mechanisms to capture long-range dependencies.
However, most of these DL-based methods still optimize CSI reconstruction error, typically measured by the mean squared error (MSE), as the primary objective. The BS uses fed-back CSI for downlink beamforming, and a lower reconstruction error does not necessarily lead to a higher beamforming gain. In massive MIMO systems, beamforming performance is more sensitive to physically meaningful features (e.g., dominant-path angles and gains), while MSE treats all entries equally and may preserve less relevant details. As a result, methods with low MSE can still yield suboptimal system performance, especially under the separated design where pilot, feedback, and beamforming are optimized independently. Therefore, CSI feedback should be evaluated based on its impact on downstream beamforming and overall communication performance rather than solely on reconstruction error.
To resolve these limitations, recent studies [16,17,18,19,20,21,22] have explored end-to-end joint modeling of pilot training, channel feedback, and beamforming with direct sum-rate maximization. Ref. [16] proposed a joint feedback and multi-user (MU) fully digital precoding network that achieved performance close to perfect-CSI linear precoding. Ref. [17] considered FDD orthogonal frequency division multiplexing (OFDM) systems and proposed a joint CSI feedback and precoding method based on deep joint source-channel coding techniques. Ref. [22] developed a precoding-oriented feedback mechanism with a dedicated loss function design to balance feedback overhead and sum rate, further demonstrating the advantage of end-to-end joint optimization.
Nevertheless, most existing end-to-end methods assume fully digital or fully connected hybrid beamforming at the BS. Although fully connected architectures can achieve strong performance, they require many phase shifters and complex RF interconnections, resulting in high hardware costs and limited scalability for deployment [23].
Beyond these implementation costs, practical CSI acquisition is further affected by hardware non-idealities, including phase noise, nonlinear amplification, synchronization offsets, and finite-resolution phase shifters. Such impairments are especially detrimental to phase-sensitive applications, such as interferometric angle-of-arrival estimation [24] and beamforming [25,26], where even small phase errors can introduce artifacts and degrade CSI accuracy. Since traditional separated block designs are highly vulnerable to the propagation of such estimation errors, there is a strong motivation to develop end-to-end joint optimization frameworks. By directly maximizing the ultimate system performance rather than relying on intermediate explicit CSI estimation, end-to-end learning naturally mitigates the accumulation of these cascading errors.
Regarding the sub-array architecture, which is of greater practical value but involves more complex constraints, relevant research is still in its infancy. Ref. [21] pioneered the end-to-end joint design of sub-array multi-user hybrid beamforming architectures, proposing a lightweight attention network called EFBAttnNET. However, this work only considered narrowband flat fading channels. In practical wideband OFDM systems, channels exhibit frequency-selective fading. Directly extending narrowband architectures to wideband scenarios fails to leverage inter-subcarrier frequency correlations, resulting in linear scaling of model parameters with the number of subcarriers.

1.2. Contributions

The main contributions of this paper are summarized as follows:
  • We introduce the sub-array hybrid beamforming architecture into FDD MU-MIMO-OFDM systems and propose a DL-based end-to-end framework for joint pilot training, CSI feedback, and hybrid beamforming. The proposed framework, comprising the pilot network, the feedback network, and the beamforming network, is trained unsupervisedly to maximize the system sum rate. It adaptively learns task-oriented pilot patterns and feedback features, avoiding explicit channel estimation and feedback in conventional separated designs.
  • We design a star efficient location attention (StarELA) module that combines the implicit high-dimensional mapping capability of star operations (element-wise multiplication) with the position-aware local interaction modeling of efficient location attention (ELA) [27]. As the core block in both the feedback and beamforming networks, StarELA improves channel feature extraction efficiency and effectiveness.
  • We propose a seed generation and interpolation upsampling strategy for the digital beamforming branch of the beamforming network at the BS. By exploiting strong inter-subcarrier correlation in wideband OFDM channels, this mechanism reduces model parameters while preserving frequency–domain smoothness and generalization of the generated digital beamformers.
  • Simulation results demonstrate that the proposed method approaches the performance of conventional sub-array hybrid beamforming with ideal CSI and outperforms existing deep learning baselines under different feedback bit budgets, exhibiting superior robustness even with limited pilot overhead.
The rest of this paper is organized as follows. Section 2 introduces the system model and formulates the joint optimization problem. Section 3 details the proposed end-to-end deep learning framework, encompassing the pilot network, the feedback network, the hybrid beamforming network, and the core StarELA module. Section 4 presents the numerical results and comprehensive performance comparisons. Finally, Section 5 concludes the paper and discusses future research directions.

2. System Model and Problem Formulation

In this section, we first introduce the system model under the sub-array hybrid beamforming architecture; then we describe the processes of downlink pilot transmission, CSI feedback, and beamforming; and finally, we formulate the optimization problem for maximizing the system sum rate.

2.1. System Model

We consider an FDD massive MU-MIMO-OFDM system with sub-array hybrid beamforming, where a BS equipped with N t uniform linear array (ULA) antennas and K RF chains serves K single-antenna UEs over N c OFDM subcarriers. The received signal at the k-th UE on the n-th subcarrier is given by:
y k [ n ] = h k [ n ] H v k [ n ] s k [ n ] + i k h k [ n ] H v i [ n ] s i [ n ] + z k [ n ]
where h k [ n ] C N t × 1 and v k [ n ] C N t × 1 are the downlink channel vector and equivalent beamforming vector for the k-th UE on the n-th subcarrier, respectively. s k [ n ] denotes the data symbol transmitted to the k-th UE on the n-th subcarrier, and z k [ n ] CN ( 0 , σ n 2 ) is additive white Gaussian noise (AWGN).
For sub-array hybrid beamforming, the N t antennas are uniformly allocated to K RF chains, where the number of antennas connected to a single RF chain is given by N g = N t / K . The equivalent beamforming vector v k [ n ] can be defined as follows:
v k [ n ] = A C d k [ n ]
where A diag e j θ A C N t × N t is the analog beamformer, which is a constant modulus diagonal matrix. θ A R N t × 1 is the phase vector of A . d k [ n ] C K × 1 is the digital beamformer for UE k on subcarrier n. The antenna connection matrix C = I K 1 N g { 0 , 1 } N t × K is a block-diagonal matrix representing the physical connection constraints between RF chains and antenna arrays in the sub-array architecture, where I K R K × K is the identity matrix and 1 N g R N g × 1 is an all-ones vector.
Furthermore, let D [ n ] [ d 1 [ n ] , , d K [ n ] ] C K × K denote the digital beamforming matrix on subcarrier n. The equivalent beamforming matrix V [ n ] [ v 1 [ n ] , , v K [ n ] ] C N t × K must satisfy the following power constraint:
| V [ n ] | F 2 = | A C D [ n ] | F 2 P / N c , n .
where P denotes the BS’s total transmit power budget per OFDM symbol, summed over all N t antennas and all N c subcarriers.
For the channel vector h k [ n ] , we adopt a typical sparse channel model [28], where each UE’s channel consists of L p a t h dominant propagation paths, as shown below:
h k [ n ] = N t L p a t h p = 1 L p a t h α p , k e j 2 π n N c τ p , k F s a t ( ϕ p , k )
where α p , k , ϕ p , k , and τ p , k denote the complex gain, angle of departure (AoD), and delay of the p-th path for the k-th UE, respectively. F s is the OFDM sampling rate. The adopted model is intended to characterize sparse directional wideband propagation conditions that are more representative of mmWave scenarios, in which hybrid beamforming is especially well suited. Accordingly, it does not explicitly target dense indoor rich-scattering channels dominated by strong diffuse multipath.
The transmitter antenna employs a ULA, so the transmit array response vector can be expressed as:
a t ( ϕ ) = 1 N t 1 , e j 2 π d λ sin ( ϕ ) , e j 2 2 π d λ sin ( ϕ ) , , e j ( N t 1 ) 2 π d λ sin ( ϕ ) T
where λ denotes the carrier wavelength, and the antenna spacing is chosen as d = λ / 2 . The phase shift between adjacent antenna elements is:
Δ ψ = 2 π d λ sin ( θ ) .
For any visible angle θ [ 90 , 90 ] , when d λ / 2 , we have Δ ψ [ π , π ] , which guarantees a unique mapping from the physical angle to the array response and, thus, avoids spatial aliasing. In contrast, when d > λ / 2 , phase wrapping causes different directions to share the same phase progression, resulting in spatial aliasing and grating lobes. Although any spacing satisfying d λ / 2 is sufficient to prevent aliasing, the half-wavelength setting is commonly adopted because it is the largest alias-free spacing and therefore provides the highest angular resolution for a given number of antennas. Hence, d = λ / 2 is particularly suitable for angle-domain channel modeling and hybrid beamforming design.

2.2. Problem Formulation

In FDD systems, UEs need to estimate downlink CSI from received downlink pilot signals and feed it back to the BS. We consider the pilot transmitted by the BS as X ˜ C N t × L , where L is the pilot length. The pilot signal received by UE k on subcarrier n can be expressed as:
y ˜ k [ n ] = h k [ n ] H X ˜ + z ˜ k [ n ] .
where y ˜ k [ n ] C 1 × L , and z ˜ k [ n ] CN ( 0 , σ n 2 I L ) is AWGN.
We define Y ˜ k [ y ˜ k [ 1 ] , , y ˜ k [ N c ] ] T C N c × L , H k [ h k [ 1 ] , , h k [ N c ] ] H C N c × N t , and Z ˜ k [ z ˜ k [ 1 ] , , z ˜ k [ N c ] ] T C N c × L . Then, the pilot signals received by UE k over all subcarriers can be written as:
Y ˜ k = H k X ˜ + Z ˜ k .
In the considered wideband OFDM system, the BS transmits L consecutive OFDM pilot symbols over the N c subcarriers. After digital baseband processing and inverse fast Fourier transform (IFFT), the resulting RF signals are weighted by the analog phase-shifter network in the sub-array architecture. Because the employed analog phase shifters apply the same phase control across the entire operating band, the equivalent antenna–domain pilot matrix X ˜ is modeled as frequency-flat with respect to the subcarrier index. Consequently, the same antenna–domain pilot matrix is shared across all subcarriers, whereas the received pilot observations remain frequency-selective due to the subcarrier-dependent channel response. Moreover, under the phase-only analog implementation, X ˜ satisfies the following element-wise constant modulus constraint:
| [ X ˜ ] i , j | = P N c N t , i , j .
Specifically, the pilot transmission is subject to the same total power budget P as the downlink data transmission, so that each subcarrier has an average power of P / N c , yielding the element-wise magnitude P / ( N c N t ) .
Upon receiving the pilot signal, UE k extracts channel features and compresses them into a bitstream q k via a feedback mapping function F ( · ) . This process can be expressed as:
q k = F ( Y ˜ k ) , q k { ± 1 } B × 1
The BS collects feedback bitstreams from all UEs Q = [ q 1 , q 2 , , q K ] T { ± 1 } K × B , and uses a beamforming design function B ( · ) to jointly recover channel features and generate beamformers. This process is formulated as:
( θ A , D ^ ) = B ( Q ) .
where D ^ = [ D ^ [ 1 ] , , D ^ [ N c ] ] C N c × K × K is the raw digital beamformer, and θ A is the phase vector of the analog beamformer.
Finally, to satisfy the transmit power constraint in Equation (3), the BS performs power normalization on D ^ :
D [ n ] = P / N c A C D ^ [ n ] F D ^ [ n ] , n .
We define the system sum rate as the sum of the achievable rates of all UEs and maximize it as the system objective. Based on the received signal model in Equation (1), the optimization problem is formulated as:
max X ˜ , F ( · ) , B ( · ) R = 1 N c n = 1 N c k = 1 K log 2 1 + | h k [ n ] H A C d k [ n ] | 2 i k | h k [ n ] H A C d i [ n ] | 2 + σ n 2 s . t . | [ X ˜ ] i , j | = P / N c N t , i , j , | [ A ] m , m | = 1 , m , A C d k [ n ] F 2 P / N c , n .

3. Proposed Method

In this section, based on the system model and optimization objective in Section 2, we propose a DL-based method for joint downlink pilot training, uplink channel feedback, and hybrid beamforming for sub-array architecture. As illustrated in Figure 2, the overall architecture comprises K parallel branches of pilot and feedback networks for distributed CSI acquisition, followed by a central hybrid beamforming network at the BS. Additionally, the system incorporates a sub-array antenna connection matrix and a power normalization module to enforce hardware constraints.

3.1. StarELA Module

To efficiently model received pilot signals and beamforming parameters under limited computational overhead, inspired by StarNet [29] and StarCANet [14], we propose a general feature processing unit named StarELA. StarELA fully exploits the advantage of star operations (element-wise multiplication) in constructing implicit high-dimensional spaces and integrates the ELA [27] mechanism, forming a compact module with strong nonlinear representation ability. The detailed structure of StarELA is shown in Figure 3.
StarELA first adopts depthwise convolution with a 7 × 7 kernel for local contextual feature extraction, effectively expanding the receptive field while keeping parameter and computational costs controllable. Subsequently, features are mapped into an expanded feature space through two parallel 1 × 1 pointwise convolution branches. One branch is nonlinearly activated and element-wise multiplied with the other branch. This parameter-free nonlinear interaction maps low-dimensional features into an implicit high-dimensional linear space, thereby improving feature representation.
Following the star operation, we introduce ELA for position-aware recalibration. Specifically, ELA decomposes the input into two orthogonal 1D feature vectors via global average pooling to capture long-range spatial dependencies. These vectors are processed with 1D convolutions to model local interactions, followed by group normalization (GN) to improve training stability. Subsequently, the generated attention weights are activated via the sigmoid function and aggregated to element-wise reweight the input features. This design achieves precise localization of salient regions with minimal computational overhead, effectively preventing information loss from channel reduction. Finally, StarELA restores the original channel dimension using a 1 × 1 pointwise convolution, followed by a 7 × 7 depthwise convolution for feature integration. Residual connections and DropPath are also incorporated to stabilize training and enhance generalization.
Overall, StarELA maintains a highly compact structure, avoids expensive global attention computation, and is suitable for deployment in resource-constrained sub-array hybrid beamforming systems. As the core feature interaction unit in our framework, its specific usage and role are further described in the following subnet designs.

3.2. Pilot Network

As the initiation of the proposed pipeline, we introduce the pilot network to perform downlink pilot training. It is designed to learn pilot sequences that maximize system performance subject to hardware constraints and is jointly optimized with the subsequent networks.
During the pilot transmission stage, the BS broadcasts a common downlink pilot matrix X ˜ C N t × L to all UEs, with the received signal model governed by Equations (7) and (8). Given the hybrid beamforming architecture, X ˜ is strictly bound by the element-wise constant modulus constraint, as detailed in Equation (9).
In [16], downlink pilot training is modeled as a bias-free fully connected layer with pilots learned as network weights; however, this formulation does not apply to our hybrid beamforming system, where pilots are implemented by analog phase shifters with fixed amplitudes.
To address this, we adopt a phase-parameterized modeling strategy. Instead of optimizing X ˜ directly, we introduce a phase matrix θ X ˜ R N t × L as the trainable parameter set and express the pilot matrix as:
X ˜ = P N c N t e j θ X ˜ = P N c N t cos ( θ X ˜ ) + j sin ( θ X ˜ ) .
This formulation intrinsically satisfies the constant modulus constraint, avoiding the need for complex non-convex projections or additional penalty terms. Furthermore, it facilitates seamless integration into the end-to-end backpropagation process of deep learning frameworks.
For network initialization, the pilot phase matrix θ X ˜ is initialized via uniform sampling of a discrete Fourier transform (DFT) matrix. This ensures favorable initial orthogonality and spectral coverage, promoting training stability and faster convergence. Subsequently, θ X ˜ is jointly optimized with feedback network parameters Θ F and hybrid beamforming network parameters Θ B to maximize the system sum-rate.
In practical deployments, the analog phase shifters execute both pilot transmission and analog beamforming via time-division multiplexing (TDM). They are initially configured with the learned phase matrix θ X ˜ to transmit downlink pilots, and dynamically reconfigured during data transmission to apply the generated analog beamformer phase vector θ A . This TDM operation allows the system to fully reuse the sub-array hardware without incurring additional RF chain costs.

3.3. Feedback Network

The feedback network is deployed at the UE side to map the high-dimensional and noisy received pilot signals Y ˜ k into compact binary bitstreams q k , as illustrated in Figure 4a. Its architecture is designed to efficiently extract and compress the coupled spatial frequency features embedded in the pilot observations under limited feedback overhead. Since conventional fully connected or standard convolutional networks either incur high parameter complexity at the UE side or are less effective in modeling long-range dependencies across antennas and subcarriers, we adopt a deep encoder architecture with StarELA as the core feature extraction block. Owing to its ability to capture both local structures and global contextual dependencies with moderate computational complexity, this design enables the UE to generate compact binary representations while preserving the essential channel semantics required for subsequent beamforming tasks at the BS.
To make the complex pilot observation Y ˜ k compatible with real-valued neural network, we first split it into real and imaginary parts and stack them into a two channel tensor Y ˜ k in . A 1 × 1 convolution is then used to expand the feature channel dimension to C u p = 8 , followed by two cascaded StarELA modules for feature refinement.
Within StarELA, the 7 × 7 depthwise convolution mitigates local noise fluctuations in the pilot domain, while the star operation exploits the nonlinear coupling between pilot sequences and channel frequency responses to extract compact semantic features. The ELA mechanism factorizes 2D features into orthogonal 1D encodings and generates location-sensitive attention weights to recalibrate the features, enabling the preservation of salient information arising from multipath effects.
Subsequently, the refined features are projected back to two channels through a 1 × 1 convolution. Following a flattening operation, the features are compressed into a real-valued vector of length B by a multi-layer perceptron (MLP) comprising cascaded fully connected (FC) layers with batch normalization (BN).
Finally, to generate the uplink feedback bitstream q k , the feature vector is binarized. However, applying the sign function directly results in zero gradients almost everywhere during backpropagation, thereby impeding training. To address this gradient vanishing problem, we adopt a sigmoid-adjusted straight-through estimator (STE) with slope annealing [16]. Specifically, during the forward propagation phase, the quantizer employs the sign ( · ) ; whereas in the backward propagation phase, the derivative of the sign function is approximated by the gradient of a scaled sigmoid function. The approximation function is formulated as:
Q soft ( x ) = 2 σ ( α · x ) 1 = 2 1 + e α · x 1 ,
where σ ( · ) denotes the sigmoid function and α is the annealing factor.
In our implementation, the annealing factor follows the explicit schedule:
α t = min ( α 0 r t , 10 ) , α 0 = 1 , r = 1.0007
where t denotes the training update step. Therefore, the quantizer starts from a smooth surrogate to ensure stable gradient propagation in the early stage and gradually approaches hard binary feedback as training proceeds. The upper bound of 10 prevents the surrogate gradient from becoming excessively sharp and numerically unstable in the late stage, while still ensuring that the final feedback symbols are effectively discrete.

3.4. Hybrid Beamforming Network

Deployed at the BS, the hybrid beamforming network is designed to generate the analog beamformer A and digital beamformer D from the collected feedback bitstreams Q . As illustrated in Figure 4b, the proposed network consists of two stages: multi-user feature refinement and fusion, and hybrid beamformer generation.
The network architecture is designed according to the physical characteristics of the wideband MU-MIMO-OFDM system and the hardware constraints of hybrid beamforming. Since suppressing multi-user interference requires joint spatial processing, a shared StarELA-based backbone is adopted to refine per-user features and capture their correlations in a unified global representation. Based on this global context, two dedicated branches are used to generate the analog and digital beamformers, respectively. This branch decomposition is motivated by their different physical properties: the analog beamformer is frequency-flat due to the phase-shifter implementation, whereas the digital beamformer must adapt to frequency-selective channel variations. To avoid the redundancy and parameter growth caused by direct per-subcarrier regression, the digital branch adopts a frequency–domain seed generation and interpolation strategy to recover the full-band beamformer from a compact set of anchor features.

3.4.1. Multi-User Feature Refinement and Fusion

To efficiently recover per-user channel spatial structures, we adopt a parallel processing strategy. The BS first expands the received feedback bits Q { ± 1 } K × B via two FC layers into K independent feature tensors. These tensors are treated as a batch and fed into a shared backbone consisting of four cascaded StarELA modules. Leveraging the feature enhancement capabilities of StarELA, the backbone progressively reconstructs fine-grained channel details. The integrated ELA mechanism further enforces positional recalibration, encouraging the reconstructed features to preserve consistent physical patterns along subcarrier and antenna dimensions.
Following independent refinement, the feature tensors of the K users are flattened, concatenated, and aggregated by a FC fusion layer. This fusion layer learns the spatial correlations among users, yielding a global context vector f f u s i o n that serves as the high-quality basis for beamforming design.

3.4.2. Hybrid Beamformer Generation

Based on the global features f f u s i o n , the network splits into two branches to generate analog and digital beamformers.
(1) Analog Beamforming Branch: Given that analog phase shifters are frequency-flat, this branch directly maps f f u s i o n to an N t dimensional phase vector θ A via a FC layer. The constant modulus analog beamformer is then constructed using Euler’s formula: A = diag ( e j θ A ) .
(2) Digital Beamforming Branch: In wideband OFDM systems, the digital beamformer should adapt to frequency-selective fading across subcarriers. Direct regression of all subcarrier parameters by FC layers often leads to overfitting and parameter explosion as N c increases. To address this, we propose an efficient Frequency–Domain Seed Generation and Interpolation Upsampling strategy, which effectively reduces the parameter complexity from O ( N c · K 2 ) to O ( N c / S · K 2 ) , as follows:
  • Seed Generation: Exploiting the frequency correlation of channels, we first map f f u s i o n to a low-resolution seed tensor F s e e d R C d × N s e e d , where C d = 64 is hidden channel dimension, N s e e d = N c / S denotes the number of seed nodes and S = 4 is the downsampling factor.
  • Interpolation and Refinement: To recover full-band resolution, F s e e d is upsampled along the frequency dimension via linear interpolation to obtain coarse features F u p R C d × N c . According to the channel model in Equation (4), the frequency–domain evolution across subcarriers is governed by a complex exponential function. However, since the subcarrier interval covered by the downsampling factor S is typically much smaller than the channel coherence bandwidth, this exponential phase rotation can be effectively approximated as a locally linear transformation. Therefore, we adopt a simple and lightweight linear interpolation for upsampling. Subsequently, we introduce a convolution block composed of 1 × 3 1D convolution, BN, and ReLU to compensate for any linear approximation errors and recover the precise nonlinear channel dynamics.
Finally, a 1D convolution layer projects the refined features to the raw digital beamformer D ^ C N c × K × K , which is subsequently normalized to obtain D satisfying the total transmit power constraint according to Equation (12).

4. Numerical Results

4.1. Experimental Settings and Benchmarks

4.1.1. System Parameters and Dataset Generation

We consider a single-cell FDD massive MIMO-OFDM, where a BS equipped with N t = 32 antennas serves K = 2 single-antenna UEs over N c = 32 subcarriers. The case of K = 2 is adopted as a representative multi-user setting, which allows us to evaluate the proposed framework under a controlled level of inter-user interference while keeping the simulation complexity manageable. Furthermore, since the proposed architecture is formulated with scalable parallel branches and shared network weights, it is inherently generalized to support a larger number of users in practical deployments.
Following the channel model in Equation (4), each UE’s channel consists of L p a t h = 2 propagation paths, with the AoD following a uniform distribution of [ 30 , 30 ] . This setting represents a sparse directional propagation scenario with limited angular spread, which is more consistent with sectorized cellular coverage than with dense indoor rich-scattering environments. The total transmit power P = 1 , and the system signal-to-noise ratio (SNR) is set to 10 dB.
As for the channel dataset, we generate 100,000 training samples and 10,000 testing samples based on these settings. All performance evaluations are conducted on the test set to ensure a fair comparison.

4.1.2. Training Settings

The proposed method adopts an end-to-end unsupervised learning strategy, with the loss function defined as the negative of the system sum rate, i.e., L = R . The network is implemented based on the PyTorch 2.6.0 framework and trained on an NVIDIA RTX 4090 GPU. We use the AdamW optimizer with an initial learning rate of 1 × 10 3 . To mitigate local optima and stabilize convergence, we applied a cosine annealing learning rate scheduler with warm-up. Specifically, the learning rate warms up for the first 15 epochs and then decays to a minimum of 5 × 10 5 over a total of 300 epochs, with a batch size of 500.

4.1.3. Benchmarks

To validate the proposed method, we compare it against the following baselines, encompassing both traditional and deep learning-based schemes:
  • Full Digital ZF with Perfect CSI: The BS is assumed to have perfect CSI and performs fully digital ZF beamforming. This serves as a theoretical upper bound.
  • SDR-AltMin with Perfect CSI and Infinite Feedback: With perfect CSI at the BS, hybrid beamforming is designed using semidefinite relaxation based AltMin (SDR-AltMin) [5], representing the upper bound of conventional hybrid beamforming without estimation or feedback distortion.
  • SDR-AltMin with OMP-CE and Infinite Feedback: Each UE estimates downlink CSI via orthogonal matching pursuit (OMP) [30], and the estimated CSI is fed back losslessly. The BS then applies SDR-AltMin.
  • SDR-AltMin with OMP-CE and Finite Feedback: Each UE first performs OMP-based channel estimation, then quantizes path gain α and AoD ϕ using Lloyd-Max quantization [31] for limited feedback; the BS reconstructs CSI and applies SDR-AltMin.
  • JEFB-ResNet: As a deep learning baseline, we re-implement the residual network-based joint channel estimation feedback beamforming model (JEFB-ResNet) [17] and adapt it to the sub-array architecture for fair comparison.

4.2. Performance Comparison

To comprehensively evaluate the proposed method, we compare the system sum rate against various benchmarks under two pilot-length settings, i.e., L = 8 and L = 4 .
Figure 5 presents the sum rate performance with a pilot length of L = 8 . It can be observed that the proposed method consistently outperforms the deep learning-based benchmark JEFB-ResNet across the entire feedback bit range ( B = 3 to 60). Specifically, at B = 60 , our method achieves a sum rate of 12.34 bits/s/Hz, providing a significant gain of 0.7 bits/s/Hz over JEFB-ResNet. Moreover, when B 5 , the proposed method already surpasses the SDR-AltMin with OMP-CE and Infinite Feedback scheme. This result highlights the critical advantage of end-to-end joint optimization: unlike traditional separated pipelines where residual channel estimation errors propagate to the beamforming stage and create a performance bottleneck, our framework is trained directly with the sum rate objective. This allows the network to allocate limited feedback bits to the most beamforming-relevant features (e.g., dominant path angles), thereby maximizing effective beamforming gain even with quantized feedback.
To further investigate the robustness of the proposed method under severely limited pilot resources, we evaluate the system sum rate when L = 4 ( L N t = 32 ), as shown in Figure 6. In this scenario, traditional CS-based methods exhibit sharp performance degradation because sparse support recovery becomes unreliable with insufficient observations. In contrast, the proposed method demonstrates remarkable robustness, maintaining a high sum rate of 12.03 bits/s/Hz at B = 60 , only a marginal decrease of 0.3 bits/s/Hz compared to the L = 8 case. Furthermore, the proposed method continues to outperform JEFB-ResNet by a clear margin. This indicates that the StarELA module possesses superior feature extraction capabilities, enabling the system to effectively recover channel semantic information even when pilot overhead is halved.
Overall, the experimental results verify that the proposed method efficiently exploits bit resources, with the sum rate increasing steadily with B and gradually approaching the upper bound of SDR-AltMin with perfect CSI. The consistent superiority over JEFB-ResNet under both pilot settings confirms that the introduced StarELA and hybrid beamformer generation strategy significantly enhances the system’s adaptability and spectral efficiency in FDD massive MIMO-OFDM systems.

5. Conclusions

In this paper, we propose an end-to-end deep learning framework for sub-array hybrid beamforming in FDD massive MU-MIMO-OFDM systems, jointly optimizing pilots, feedback, and beamforming. This approach eliminates the performance bottleneck caused by module independence in traditional designs. The framework incorporates a novel StarELA module for robust feature extraction and employs a digital beamformer generation strategy based on frequency–domain interpolation to reduce parameter complexity in wideband systems. Simulation results validate that our method outperforms existing deep learning benchmarks and closely approaches the theoretical upper bound of traditional algorithms with perfect CSI, demonstrating exceptional robustness in scenarios with limited pilot resources.
Future work will further validate the proposed framework through real-world over-the-air measurement campaigns and investigate its scalability under larger numbers of users and more complex multi-user interference conditions. In practical deployments, the achievable performance may be affected by non-ideal hardware factors, such as antenna mutual coupling, phase shifter quantization, and mismatch between measured channels and the adopted sparse channel model. Such studies will provide further insight into the robustness and practical feasibility of the proposed method.

Author Contributions

Conceptualization, K.Z. and H.W.; methodology, K.Z.; software, K.Z.; validation, K.Z. and H.W.; formal analysis, K.Z. and Y.X.; investigation, K.Z.; resources, Y.X.; data curation, K.Z.; writing—original draft preparation, K.Z.; writing—review and editing, K.Z., H.W., W.Y. and Y.X.; visualization, K.Z.; supervision, Y.X.; project administration, W.Y. and Y.X.; funding acquisition, Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Shanghai Gerontechnology Innovation Park Research and Development and Testing Public Service Platform Construction Project under Grant 24YL1901302.

Data Availability Statement

The data are available from the corresponding author upon reasonable request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wang, C.X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the road to 6G: Visions, requirements, key technologies, and testbeds. IEEE Commun. Surv. Tutor. 2023, 25, 905–974. [Google Scholar] [CrossRef]
  2. Wang, Z.; Zhang, J.; Du, H.; Niyato, D.; Cui, S.; Ai, B.; Debbah, M.; Letaief, K.B.; Poor, H.V. A tutorial on extremely large-scale MIMO for 6G: Fundamentals, signal processing, and applications. IEEE Commun. Surv. Tutor. 2024, 26, 1560–1605. [Google Scholar] [CrossRef]
  3. Guo, J.; Wen, C.K.; Jin, S.; Li, G.Y. Overview of deep learning-based CSI feedback in massive MIMO systems. IEEE Trans. Commun. 2022, 70, 8017–8045. [Google Scholar] [CrossRef]
  4. Chowdary, A.; Bazzi, A.; Chafii, M. Uplink and downlink communications fusion for enhanced radar sensing. In Proceedings of the IEEE International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Shanghai, China, 25–28 September 2023; pp. 446–450. [Google Scholar] [CrossRef]
  5. Yu, X.; Shen, J.C.; Zhang, J.; Letaief, K.B. Alternating minimization algorithms for hybrid precoding in millimeter wave MIMO systems. IEEE J. Sel. Top. Signal Process. 2016, 10, 485–500. [Google Scholar] [CrossRef]
  6. Guo, Y.; Li, L.; Wen, X.; Chen, W.; Han, Z. Sub-array based hybrid precoding design for downlink millimeter-wave multi-user massive MIMO systems. In Proceedings of the International Conference on Wireless Communications and Signal Processing (WCSP), Nanjing, China, 11–13 October 2017; pp. 1–4. [Google Scholar] [CrossRef]
  7. Qin, Z.; Fan, J.; Liu, Y.; Gao, Y.; Li, G.Y. Sparse representation for wireless communications: A compressive sensing approach. IEEE Signal Process. Mag. 2018, 35, 40–58. [Google Scholar] [CrossRef]
  8. Xing, Y.; Chen, Y.; Yang, L. MMSE-based wideband hybrid precoding for massive MIMO systems. In Proceedings of the International Conference on Information and Communication Technology Convergence (ICTC), Jeju, Republic of Korea, 19–21 October 2016; pp. 18–20. [Google Scholar] [CrossRef]
  9. Dai, L.; Gao, X.; Quan, J.; Han, S. Near-optimal hybrid analog and digital precoding for downlink mmWave massive MIMO systems. In Proceedings of the IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; pp. 1334–1339. [Google Scholar] [CrossRef]
  10. Wen, C.K.; Shih, W.T.; Jin, S. Deep learning for massive MIMO CSI feedback. IEEE Wirel. Commun. Lett. 2018, 7, 748–751. [Google Scholar] [CrossRef]
  11. Lu, Z.; Wang, J.; Song, J. Multi-resolution CSI feedback with deep learning in massive MIMO system. In Proceedings of the IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–6. [Google Scholar] [CrossRef]
  12. Guo, J.; Wen, C.K.; Jin, S.; Li, G.Y. Convolutional neural network-based multiple-rate compressive sensing for massive MIMO CSI feedback: Design, simulation, and analysis. IEEE Trans. Wirel. Commun. 2020, 19, 2827–2840. [Google Scholar] [CrossRef]
  13. Cui, Y.; Guo, A.; Song, C. TransNet: Full attention network for CSI feedback in FDD massive MIMO system. IEEE Wirel. Commun. Lett. 2022, 11, 903–907. [Google Scholar] [CrossRef]
  14. Zhao, K.; Wu, H.; Xiong, Y.; Zhu, L.; Xu, M. StarCANet: A compact and efficient neural network for massive MIMO CSI feedback. IEEE Wirel. Commun. Lett. 2025, 14, 540–544. [Google Scholar] [CrossRef]
  15. Tang, L.; Sun, Y.; Yao, S.; Xu, X.; Chen, H.; Luo, Z. Efficient multiple-input–multiple-output channel state information feedback: A semantic-knowledge-base-driven approach. Electronics 2025, 14, 1666. [Google Scholar] [CrossRef]
  16. Sohrabi, F.; Attiah, K.M.; Yu, W. Deep learning for distributed channel feedback and multiuser precoding in FDD massive MIMO. IEEE Trans. Wirel. Commun. 2021, 20, 4044–4057. [Google Scholar] [CrossRef]
  17. Wu, M.; Gao, Z.; Gao, Z.; Wu, D.; Yang, Y.; Huang, Y. Deep learning-based hybrid precoding for FDD massive MIMO-OFDM systems with a limited pilot and feedback overhead. In Proceedings of the IEEE International Conference on Communications Workshops (ICC Workshops), Seoul, Republic of Korea, 16–20 May 2022; pp. 318–323. [Google Scholar] [CrossRef]
  18. Sun, Q.; Zhao, H.; Wang, J.; Chen, W. Deep learning-based joint CSI feedback and hybrid precoding in FDD mmWave massive MIMO systems. Entropy 2022, 24, 441. [Google Scholar] [CrossRef]
  19. Gao, Z.; Wu, M.; Hu, C.; Gao, F.; Wen, G.; Zheng, D.; Zhang, J. Data-driven deep learning based hybrid beamforming for aerial massive MIMO-OFDM systems with implicit CSI. IEEE J. Sel. Areas Commun. 2022, 40, 2894–2913. [Google Scholar] [CrossRef]
  20. Guo, Y.; Chen, W.; Xu, J.; Li, L.; Ai, B. Deep joint CSI feedback and multiuser precoding for MIMO OFDM systems. IEEE Trans. Veh. Technol. 2025, 74, 1730–1735. [Google Scholar] [CrossRef]
  21. Lu, Z.; Zhang, X.; Zeng, R.; Wang, J. Towards efficient subarray hybrid beamforming: Attention network-based practical feedback in FDD massive MU-MIMO Systems. arXiv 2023. [Google Scholar] [CrossRef]
  22. Carpi, F.; Venkatesan, S.; Du, J.; Viswanathan, H.; Garg, S.; Erkip, E. Learned precoding-oriented CSI Feedback in multi-cell multi-user MIMO Systems. IEEE Trans. Wirel. Commun. 2026, 25, 2359–2372. [Google Scholar] [CrossRef]
  23. El Ayach, O.; Rajagopal, S.; Abu-Surra, S.; Pi, Z.; Heath, R.W. Spatially sparse precoding in millimeter wave MIMO systems. IEEE Trans. Wirel. Commun. 2014, 13, 1499–1513. [Google Scholar] [CrossRef]
  24. Florio, A.; Avitabile, G.; Coviello, G. A linear technique for artifacts correction and compensation in phase interferometric angle of arrival estimation. Sensors 2022, 22, 1427. [Google Scholar] [CrossRef] [PubMed]
  25. Myers, N.J.; Kannu, A.P. Impact of channel estimation errors on single stream MIMO beamforming. IEEE Commun. Lett. 2017, 21, 1345–1348. [Google Scholar] [CrossRef]
  26. Pourmohammad, A.S.; Amirhossein, N.; Chen, S.C.; Rong-Ho, L. Deep learning-enhanced hybrid beamforming design with regularized SVD under imperfect channel information. Mathematics 2026, 14, 509. [Google Scholar] [CrossRef]
  27. Xu, W.; Wan, Y.; Zhao, W. ELA: Efficient location attention for deep convolution neural networks. J. Real-Time Image Process. 2025, 22, 1–14. [Google Scholar] [CrossRef]
  28. Sohrabi, F.; Yu, W. Hybrid digital and analog beamforming design for large-scale antenna arrays. IEEE J. Sel. Top. Signal Process. 2016, 10, 501–513. [Google Scholar] [CrossRef]
  29. Ma, X.; Dai, X.; Bai, Y.; Wang, Y.; Fu, Y. Rewrite the stars. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 16–22 June 2024; pp. 5694–5703. [Google Scholar] [CrossRef]
  30. Tropp, J.A.; Gilbert, A.C. Signal recovery from random measurements via orthogonal matching pursuit. IEEE Trans. Inf. Theory 2007, 53, 4655–4666. [Google Scholar] [CrossRef]
  31. Lloyd, S. Least squares quantization in PCM. IEEE Trans. Inf. Theory 1982, 28, 129–137. [Google Scholar] [CrossRef]
Figure 1. The architecture of sub-array hybrid beamforming.
Figure 1. The architecture of sub-array hybrid beamforming.
Electronics 15 01255 g001
Figure 2. The overall architecture of the proposed method.
Figure 2. The overall architecture of the proposed method.
Electronics 15 01255 g002
Figure 3. The structure of the proposed StarELA.
Figure 3. The structure of the proposed StarELA.
Electronics 15 01255 g003
Figure 4. The network architecture of the proposed (a) feedback network and (b) hybrid beamforming network.
Figure 4. The network architecture of the proposed (a) feedback network and (b) hybrid beamforming network.
Electronics 15 01255 g004
Figure 5. Sum rate comparison of different methods in a 2-user FDD sub-array hybrid beamforming MIMO-OFDM system with N t = 32 , N c = 32 , and L = 8 .
Figure 5. Sum rate comparison of different methods in a 2-user FDD sub-array hybrid beamforming MIMO-OFDM system with N t = 32 , N c = 32 , and L = 8 .
Electronics 15 01255 g005
Figure 6. Sum rate comparison of different methods in a 2-user FDD Sub-array hybrid beamforming MIMO-OFDM system with N t = 32 , N c = 32 , and L = 4 .
Figure 6. Sum rate comparison of different methods in a 2-user FDD Sub-array hybrid beamforming MIMO-OFDM system with N t = 32 , N c = 32 , and L = 4 .
Electronics 15 01255 g006
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, K.; Wu, H.; Yao, W.; Xiong, Y. Deep Learning for Joint Pilot, Channel Feedback and Sub-Array Hybrid Beamforming in FDD Massive MU-MIMO-OFDM Systems. Electronics 2026, 15, 1255. https://doi.org/10.3390/electronics15061255

AMA Style

Zhao K, Wu H, Yao W, Xiong Y. Deep Learning for Joint Pilot, Channel Feedback and Sub-Array Hybrid Beamforming in FDD Massive MU-MIMO-OFDM Systems. Electronics. 2026; 15(6):1255. https://doi.org/10.3390/electronics15061255

Chicago/Turabian Style

Zhao, Kai, Haiyi Wu, Wei Yao, and Yong Xiong. 2026. "Deep Learning for Joint Pilot, Channel Feedback and Sub-Array Hybrid Beamforming in FDD Massive MU-MIMO-OFDM Systems" Electronics 15, no. 6: 1255. https://doi.org/10.3390/electronics15061255

APA Style

Zhao, K., Wu, H., Yao, W., & Xiong, Y. (2026). Deep Learning for Joint Pilot, Channel Feedback and Sub-Array Hybrid Beamforming in FDD Massive MU-MIMO-OFDM Systems. Electronics, 15(6), 1255. https://doi.org/10.3390/electronics15061255

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop