Abstract
Extremely Large-Scale Multiple-Input Multiple-Output (XL-MIMO) is positioned as a transformative technology for sixth-generation (6G) networks, effectively turning base stations into high-resolution sensing and communication hubs. However, the practical deployment of XL-MIMO is hindered by the “curse of dimensionality,” specifically the prohibitive overhead associated with Channel State Information (CSI) sensing and feedback, alongside the computational latency of massive antenna arrays. To resolve the conflict between high-resolution sensing requirements and limited bandwidth resources, this paper proposes a novel two-stage beamforming architecture that synergizes physics-aware dimensionality reduction with deep learning. First, by exploiting the inherent sparsity of XL-MIMO channels in the angle-delay domain, we design a Spatial–Frequency Concentration Block (SFCB). This module functions as a hard-attention sensing mechanism, performing efficient source-end dimensionality reduction on raw CSI at the User Equipment (UE) via precise feature extraction and adaptive energy truncation. Second, we develop a highly adaptable Direct Integrated Precoding Network (DIP-I). Departing from the conventional “sense-reconstruct-then-precode” paradigm, DIP-I learns end-to-end mapping to directly regress the optimal precoding matrix at the Base Station (BS). Comprehensive simulations utilizing the COST 2100 and QuaDRiGa hybrid channel models demonstrate that, under a massive 512-antenna configuration, the proposed framework achieves exceptional beamforming gain. Furthermore, it significantly reduces sensing data overhead and inference latency, offering a superior trade-off between spectral efficiency and hardware resource consumption for future 6G sensing-communication integrated systems.
1. Introduction
As the global research community pivots toward the sixth-generation (6G) mobile communication era, the boundary between sensing and communication is becoming increasingly blurred. The demand for ubiquitous connectivity, holographic communication, and the tactile internet has driven antenna technologies toward unprecedented scales [1]. Extremely Large-Scale Multiple-Input Multiple-Output (XL-MIMO) has emerged as a pivotal enabler to meet these rigorous demands. By deploying hundreds or even thousands of antennas at the Base Station (BS), XL-MIMO offers extremely high spatial resolution, effectively transforming the array into a massive sensor capable of resolving fine-grained electromagnetic environments [2,3]. Furthermore, the continuous evolution of massive arrays—incorporating components like reconfigurable intelligent surfaces (RISs) and movable time-modulated arrays—has empowered novel capabilities beyond traditional data transmission, ranging from integrated sensing and communication (ISAC) [4] to covert and secure satellite-terrestrial networking [5,6].
Unlike traditional massive MIMO, XL-MIMO systems frequently operate in the radiative near-field region due to the significantly expanded Rayleigh distance. This introduces complex electromagnetic characteristics, most notably the spherical wavefront effect and spatial non-stationarity, which render traditional plane-wave assumptions invalid [7,8]. Recent channel modeling efforts emphasize the necessity of using 3D non-stationary models with visibility regions to accurately capture these near-field dynamics [9], which also fundamentally impacts multiple access strategies [10]. In this context, accurate Channel State Information (CSI) acquisition is not merely a communication prerequisite but a complex channel sensing task. The BS requires precise downlink CSI to design precoding matrices. However, the dimension of the channel matrix grows linearly with the number of antennas, leading to a “sensing data deluge” that exceeds the capacity of limited feedback control channels [11,12]. While Compressive Sensing (CS) techniques have been employed to reduce this sensing overhead, they often rely on strict sparsity assumptions that may not hold in complex near-field environments where scattering clusters are non-uniformly distributed. In recent years, Deep Learning (DL) has revolutionized the physical layer of wireless communications, offering a new paradigm for sensing data compression [13]. Seminal works, such as CsiNet [14], utilized autoencoders to compress channel matrices. Subsequent studies integrated attention mechanisms and multi-resolution architectures to further enhance reconstruction accuracy [15,16]. Building on these foundations, recent literature has rapidly expanded to address specific near-field challenges. For instance, lightweight autoencoders have been developed to alleviate feedback overhead in FDD systems [17], while advanced machine learning models have been proposed specifically for near-field CSI feedback and channel estimation [18,19]. Moreover, emerging AI-native paradigms, including Generative AI, have demonstrated tremendous potential in predicting CSI amidst severe spatial non-stationarity [20,21]. Despite these advancements, directly applying existing DL-based CSI feedback schemes to 6G XL-MIMO reveals two primary limitations:
- Neglect of Near-Field Sensing Sparsity: Most existing networks treat the channel matrix as a generic image, ignoring the specific Angle-Delay Domain sparsity caused by the limited scattering clusters in XL-MIMO environments. This leads to the inefficient allocation of neural network resources to noise rather than significant sensing features [22].
- Inefficiency of the Reconstruct-then-Precode Paradigm: Conventional approaches aim to minimize the Mean Squared Error (MSE) of the reconstructed channel. However, the ultimate objective of the sensing process in FDD systems is to maximize beamforming gain (spectral efficiency), not merely to reconstruct the raw data. Reconstructing the full high-dimensional channel at the BS before calculating the precoding matrix (e.g., via Singular Value Decomposition, SVD) is computationally expensive and introduces unnecessary latency [23]. Although recent deep neural network-based strategies have made strides in low-overhead beam management [24], integrating these into a true end-to-end precoding paradigm remains inefficient.
To address these challenges, this paper proposes an efficient, end-to-end limited feedback beamforming solution specifically tailored for the unique sensing characteristics of XL-MIMO. We argue that by synergizing physical domain knowledge (sparsity) with data-driven deep learning, high-performance beamforming can be achieved with significantly reduced sensing overhead.
The main contributions of this paper are summarized as follows:
- (1)
- Analysis of XL-MIMO Spatial-Frequency Sensing Sparsity: We systematically analyze the energy distribution of XL-MIMO channels using hybrid channel models (COST 2100 and QuaDRiGa). Empirical analysis verifies that the channel energy is highly concentrated in specific regions of the Angle-Delay domain, motivating a physics-driven sensing compression strategy.
- (2)
- Design of Spatial–Frequency Concentration Block (SFCB): Instead of processing the full raw CSI, we introduce the SFCB, a pre-processing module that acts as a “hard attention” mechanism. It dynamically screens features based on energy gradients, achieving efficient dimensionality reduction at the source (UE side) and significantly reducing the input size for the subsequent neural network.
- (3)
- Development of Direct Integrated Precoding Network (DIP-I): We propose a lightweight end-to-end network, DIP-I, which maps compressed features directly to the precoding matrix. This design bypasses the explicit channel reconstruction stage, avoiding error accumulation and reducing the computational complexity of SVD operations at the BS.
- (4)
- Validation in Realistic Scenarios: We evaluate the proposed scheme under complex indoor (COST 2100) and outdoor non-line-of-sight (NLOS) (QuaDRiGa) scenarios with a 512-antenna array. Results confirm that our approach outperforms separated feedback-precoding schemes in terms of effective sum-rate and computational efficiency.
Such high spatial resolution and near-field sensing capabilities enable diverse real-world applications. For instance, in smart factory environments, XL-MIMO can provide centimeter-level localization and high-throughput connectivity for industrial robots. In dense urban hotspots like stadiums or transit hubs, the proposed beamforming scheme can effectively mitigate interference through ultra-fine spatial multiplexing, ensuring robust service for thousands of concurrent users.
The remainder of this paper is organized as follows. Section 2 describes the XL-MIMO system model and channel characteristics. Section 3 details the proposed SFCB mechanism. Section 4 presents the DIP-I neural network architecture. Section 5 discusses the simulation results, and Section 6 concludes the paper.
2. System Model and XL-MIMO Channel Characteristics
2.1. XL-MIMO System Model
We consider a downlink XL-MIMO system where the Base Station (BS) is equipped with an extremely large uniform linear array (ULA) composed of antenna elements, serving a single-antenna user [25]. Table 1 presents the Mathematical Notations and Definitions.
Table 1.
Mathematical Notations and Definitions.
2.1.1. Array Layout and Field Partitioning
Assume the ULA antenna spacing is , typically set to ( is the carrier wavelength). The physical aperture of XL-MIMO is much larger than that of traditional arrays. Based on the distance between the user and the BS, the spatial propagation environment is strictly divided into the far-field region and the near-field region, with the boundary defined by the Rayleigh Distance ():
In XL-MIMO scenarios, increases significantly (e.g., reaching several hundred meters for a 1024-antenna array at 30 GHz). When the user is located in the near-field region (), the electromagnetic wavefront exhibits spherical curvature, rendering the traditional Plane Wave Model (PWM) inapplicable. Consequently, the Spherical Wave Model (SWM) must be adopted to accurately describe phase variations [26].
2.1.2. Signal Transmission Model
In an Orthogonal Frequency Division Multiplexing (OFDM) system containing subbands, the received signal for the -th subband is expressed as:
where is the downlink channel vector for the -th subband; is the precoding vector designed by the BS; is the transmitted symbol satisfying the power constraint ; and denotes additive white Gaussian noise. The objective of this paper is to optimize under limited feedback constraints to maximize system spectral efficiency.
2.2. Multi-Scenario Channel Modeling
To verify algorithm robustness, this paper adopts complementary channel modeling methods:
- (1)
- Indoor Scenario (COST 2100): For the 5.3 GHz band, the geometry-based stochastic COST 2100 model is adopted [27,28]. This model effectively captures the spatial non-stationarity caused by the large aperture of XL-MIMO arrays, where different antenna subsets may observe different scattering clusters.
- (2)
- Outdoor Scenario (QuaDRiGa): For the 2.1 GHz outdoor NLOS scenario, the QuaDRiGa platform complying with the 3GPP TS 38.901 standard is utilized [29]. This platform accurately simulates SWM propagation characteristics and Visibility Region (VR) effects. The simulation area covers a specific range with the BS configured with 512 antennas.
2.3. Limited Feedback Architecture
The complete limited feedback link consists of three stages (as shown in Figure 1):
Figure 1.
Limited feedback architecture diagram.
- (1)
- Channel Estimation: Downlink channel estimation at the UE side to obtain the downlink CSI matrix .
- (2)
- Compressed Feedback: The UE utilizes the proposed dimensionality reduction mechanism (SFCB) and an encoder to map the CSI into a low-dimensional codeword .
- (3)
- Reconstruction and Beamforming: The BS side uses a pre-trained network to reconstruct the precoding matrix directly from the feedback codeword.
3. Dimensionality Reduction Pre-Processing Mechanism Based on SFCB
3.1. Data Distribution Characteristics in Angle-Delay Domain
To solve the curse of dimensionality, it is first necessary to analyze the intrinsic distribution of the signal. A 2D Discrete Fourier Transform (2D-DFT) is used to map the spatial-frequency domain CSI matrix to the Angle-Delay Domain matrix :
where and represent the DFT matrices for the delay (subband) and angle (antenna) dimensions, respectively [30,31].
For each data entry, the energy distribution proportions in both sub-band and antenna dimensions are statistically analyzed, and the results from multiple data samples are accumulated (as shown in Figure 2 and Figure 3).
Figure 2.
Energy distribution in the antenna dimension.
Figure 3.
Energy distribution in the sub-band dimension.
It can be observed from the figures that the XL-MIMO channel exhibits significant Non-uniform Concentration in the Angle-Delay Domain: energy is not diffusely distributed but highly concentrated in a few “power centroid” regions. This physical characteristic provides a theoretical basis for selective pruning based on energy gradients [32].
3.2. Design of Spatial–Frequency Concentration Block (SFCB)
Based on the aforementioned XL-MIMO data distribution characteristics, a pluggable CSI compression precoding module is designed, with the core component being the SFCB (Spatial–Frequency Concentration Block).
3.2.1. Algorithm Workflow
The design goal of SFCB is to maximize the retention of core intrinsic features reflecting XL-MIMO physical characteristics while drastically reducing data dimensions. Mathematically, this serves as a discrete optimization pre-processing step that maximizes the retained channel energy subject to specific dimensionality constraints. For a single CSI sample , SFCB employs a dynamic energy-containment mechanism to adaptively determine the optimal pruning window without manual tuning. The specific process is detailed in Algorithm 1:
| Algorithm 1. Adaptive Spatial-Frequency Concentration Block (SFCB) | |
| Require: Angle-Delay CSI matrix Ĥ ∈ ℂNs × Na, Energy concentration ratio Γ. Ensure: Concentrated feature matrix Hout ∈ ℂMs × Ma. | |
| 1: | Step 1: Joint-Domain Energy Mapping. Calculate power density E = |Ĥ|2; |
| 2: | Step 2: Marginal Energy Projection. |
| 3: | es = ∑j=1Na E(·, j), ea = ∑i=1Ns E(i, ·); |
| 4: | Step 3: Adaptive Aperture Determination. |
| 5: | for each dimension d ∈ {s, a} do |
| 6: | edsort = sort_desc(ed); |
| 7: | Find minimal Md s.t. (∑k=1Md edsort(k))/(∑k=1Nd ed(k)) ≥ Γ; |
| 8: | Identify feature-rich index set d based on Md; |
| 9: | end for |
| 10: | Step 4: Spatial Topology Alignment. |
| 11: | s = sort_asc(s), a = sort_asc(a); ▷ Recover original physical structure |
| 12: | Step 5: Dimensionality Resynthesis. |
| 13: | Slice feature matrix Hout = Ĥ(s, a); |
| 14: | return Hout |
Step 1: Joint-Domain Energy Mapping. To identify the power distribution across space and frequency, we first construct the energy distribution map by calculating the squared modulus of each element in :
Step 2: Marginal Energy Projection. To evaluate the contribution of individual dimensions, the energy matrix is aggregated along the row and column dimensions.
Row-wise (Sub-band) Mapping: Summing along the antenna dimension yields the subband energy vector , reflecting the frequency-domain energy contribution.
Column-wise (Antenna) Mapping: Summing along the subband dimension yields the antenna energy vector , reflecting the spatial-domain energy contribution.
Step 3: Adaptive CDF-based Feature Pruning. To precisely determine the optimal dimensions () for varying channel conditions, we utilize a Cumulative Distribution Function (CDF) thresholding strategy. Let “sort” (e) be the vector e sorted in descending order. The target dimension N is adaptively determined by satisfying a predefined energy containment ratio Γ (e.g., Γ = 0.95):
By calculating this for both and , the mechanism automatically shrinks the window in sparse (LoS) conditions and expands it in rich-scattering (NLOS) environments to preserve significant path components. The indices and corresponding to the and largest energy values are then selected.
Step 4: Spatial Topology Alignment. To ensure the extracted features maintain the underlying spatial-frequency correlations, the selected indices are re-sorted in ascending order:
This step preserves the physical topology of the channel, which is essential for downstream feature learning via Convolutional Neural Networks.
Step 5: Dimensionality Resynthesis. Finally, the dimensionality-reduced matrix is synthesized by extracting the elements at the intersection of the screened indices:
3.2.2. Module Advantages
SFCB adopts a Decoupled design, embedded as an independent pluggable module at the front end of the neural network on the UE side (as shown in Figure 4). Its advantages include:
Figure 4.
Schematic diagram of SFCB and feedback network.
- (1)
- Autonomous Adaptation: The CDF-based thresholding eliminates the need for manual hyperparameter tuning for , allowing the system to handle non-stationary XL-MIMO channels across different UE locations and clusters.
- (2)
- Reduced Load: Directly reduces the input dimensions and parameter count of the subsequent encoding network.
- (3)
- Optimized Latency: Significantly lowers the Floating Point Operations (FLOPs) during the inference phase.
4. Proposed Two-Stage Beamforming Scheme Based on Dimensionality Reduction Precoding
4.1. Overall Architecture
This paper proposes a complete limited feedback beamforming scheme for XL-MIMO scenarios, named DIP-Net (as shown in Figure 5). The scheme decouples the processing flow into two stages: “Hard Compression” and “Soft Reconstruction”:
Figure 5.
Design diagram of the limited feedback beamforming scheme.
- Stage 1 (Hard Compression): SFCB performs physical-level feature dimensionality reduction, acting as a hard attention mechanism that forces the network to focus on the main path components where channel energy is concentrated.
- Stage 2 (Soft Reconstruction): The deep neural network DIP-I performs end-to-end feature extraction and precoding matrix generation.
4.2. Stage 1: Dimensionality Reduction Precoding
By introducing a compression ratio, the dynamic masking mechanism of SFCB outputs the feature matrix . This mechanism acts as a hard attention module, ensuring the network prioritizes significant channel components.
4.3. Stage 2: DIP-I Network Design
This section proposes the DIP-I (Direct Integrated Precoding Integrated Scheme). Unlike traditional CSI feedback (which first recovers and then computes ), DIP-I aims to directly regress the optimal precoding matrix from the compressed codeword [29].
4.3.1. Mathematical Description of System Flow
First, the raw CSI undergoes SFCB dimensionality reduction to obtain . Subsequently, the encoder maps it to a codeword named :
After quantization and feedback, the decoder at the BS side directly generates the precoding matrix :
System performance is evaluated via the sum rate:
In the data transmission process, denotes the system noise power.
4.3.2. DIP-I Network Architecture
DIP-I adopts a supervised learning strategy, with the network structure shown in Figure 6:
Figure 6.
DIP-I limited feedback beamforming network structure.
- Training Labels: Singular Value Decomposition (SVD) is performed on the unpruned perfect CSI to extract the principal eigenvector as the ideal label.
- Encoder (UE side): The input is the data pruned by SFCB. The structure includes 3 convolutional layers (Conv2D, kernel counts 2-8-2) and 1 fully connected layer, responsible for feature extraction and codeword compression.
- Decoder (BS side): First recovers dimensions through a fully connected layer, followed by 3 cascaded Residual Blocks for deep feature reconstruction. The convolutional layers are configured with kernel counts 2-8-16, and the receptive field size is . Finally, a convolutional layer with 2 kernels and a receptive field of outputs the precoding matrix. The residual block design effectively alleviates the gradient vanishing problem and enhances the learning capability for high-dimensional non-linear mapping [30].
4.3.3. Optimization Objective and Training Procedure
To address the end-to-end precoding task, the training procedure is mathematically formulated as an optimization problem aimed at minimizing the discrepancy between the network’s predicted precoding matrix and the optimal SVD-derived matrix. Let represent the learnable parameters of the entire DIP-I network.
- (1)
- Loss Function: The network parameters are optimized by minimizing the Mean Squared Error (MSE) loss function, which quantifies the Euclidean distance between the predicted precoding vector and the ideal label . For a training batch of size B, the objective function is defined as:
- (2)
- Optimization Algorithm and Hyperparameters: The optimization procedure employs the Adaptive Moment Estimation (Adam) optimizer, chosen for its robust convergence properties in non-convex neural network optimization. The specific training procedures are implemented as follows:
- Weight Initialization: Network weights are initialized using the Xavier (Glorot) normal distribution to maintain variance consistency across convolutional layers and prevent early-stage gradient explosion.
- Learning Rate Scheduling: The initial learning rate is set to . A dynamic learning rate decay strategy (e.g., ReduceLROnPlateau) is applied during the optimization process. If the validation loss fails to decrease for a consecutive number of epochs, the learning rate is scaled down by a factor of 0.5, ensuring fine-grained parameter updates near the global minimum.
- Batch Training: The dataset is divided into mini-batches (e.g., ). In each iteration, stochastic gradients are computed through backpropagation, and the parameters are iteratively updated until early stopping criteria are met or the maximum number of epochs is reached.
5. Simulation Results and Performance Evaluation
5.1. Simulation Parameter Settings
To comprehensively evaluate the performance of the proposed DIP-Net, an extensive dataset was constructed encompassing both indoor and outdoor communication scenarios. Experiments are based on the PyTorch (2.5.0) framework. Table 2 presents the key simulation and physical antenna parameters.
Table 2.
Key Simulation and Physical Antenna Parameters.
- Channel Models: COST 2100 [28] (5.3 GHz, Indoor)/QuaDRiGa (2.1 GHz, Outdoor NLOS).
- Antenna Configuration: BS with 512-antenna ULA with an antenna spacing of half a wavelength, and there are 13 sub-bands. In indoor and outdoor scenarios, place BS at the center of an area with a side length of 20 m and 40 m, respectively. Users are randomly distributed in the above areas.
- Dataset and Network Parameters: The dataset contains 120,000 training samples and 30,000 testing samples, containing 50% indoor scenarios and 50% outdoor scenarios. Epochs = 1000, Batch Size = 128 and Learning Rate = 0.001. The loss function is MSE, and the optimizer is Adam.
- Training Strategy: A two-step training method is adopted: during the training of the network, the SFCB module is added before the encoder to reduce the dimensionality of XL-MIMO data and uses the offline training mode. During the training, the neural network without the DIP-I module is first trained. The input of the network is the dimension-reduced data, and the supervision label is the precoding matrix obtained by performing SVD on the original CSI matrix that is not dimension-reduced. After the training is complete, the DIP-I module is added to quantize the feedback codewords, and then the decoder is trained for 500 epochs.
5.2. Performance Comparative Analysis
5.2.1. Impact of SFCB on Overhead and Performance
Figure 7 illustrates the comparison of network overhead before and after introducing SFCB. Taking 512 feedback codewords as an example, the introduction of SFCB significantly reduces input dimensions, resulting in a substantial decrease in the number of parameters and inference time for neural network training, as well as significantly reduced storage requirements.
Figure 7.
DIP-I neural network training parameters and inference latency under different datasets.
To verify the impact of introducing the SFCB module on beamforming accuracy, this section compares the Normalized Mean Square Error (NMSE) performance of the network in indoor and outdoor scenarios when the input is SFCB-pruned data versus raw data, as shown in Figure 8.
Figure 8.
NMSE performance comparison of DIP-I neural network under different input modes.
Comparative experiments show that compared to inputting raw full-volume data, DIP-I with SFCB-pruned data input incurs only a minimal performance loss in NMSE (<0.5 dB). This confirms that SFCB can trade a negligible accuracy cost for significant computational efficiency gains, validating the effective utilization of XL-MIMO channel sparsity in the Angle-Delay Domain.
5.2.2. Comparison of Limited Feedback Beamforming Schemes
To verify the architectural advantages of DIP-I, two schemes are compared:
- DIP-S (Separated Scheme): The network is responsible only for reconstructing CSI (), and the BS side calculates precoding via SVD.
- DIP-I (Integrated Scheme): The integrated scheme proposed in this paper. The network directly outputs the precoding matrix .
Experimental results in Figure 9 show that the DIP-I scheme is significantly superior to DIP-S in precoding accuracy.
Figure 9.
NMSE performance comparison of two different beamforming schemes.
5.2.3. Throughput Performance Analysis
Table 3 presents the average sum-rate performance under different feedback overheads and signal-to-noise ratios (SNRs). The experiment considers compression dimensions of 256, 512, and 1024, and introduces an ideal scheme (using perfect CSI for direct SVD) as the theoretical upper bound.
Table 3.
Comparison of average sum-rates for various schemes with noise power of 0.1 under different feedback overheads (bit/s/Hz).
It can be seen that in the XL-MIMO scenario, the proposed DIP-I scheme outperforms DIP-S under different feedback overheads, and the average sum-rate is positively correlated with feedback accuracy. With larger indoor feedback overhead, the average sum-rate is very close to the theoretical optimal value. The comparison of average sum-rates for each scheme under different noise powers is shown in Figure 10.

Figure 10.
Comparison of average sum-rates of various schemes under different noise powers.
The data indicate that in XL-MIMO scenarios, the proposed DIP-I scheme outperforms the DIP-S scheme under various feedback overheads and transmission noise conditions. Specifically, for the DIP-S scheme, since the channel CSI matrix must first be fed back to the BS side before performing precoding operations, the error between the CSI matrix received at the BS and the original CSI matrix leads to quality degradation of the precoding matrix during subsequent SVD operations, thereby affecting system performance. Furthermore, since this scheme requires further precoding of the received CSI matrix at the BS, it demands substantial additional computational resources and time, leading to high system complexity.
In contrast, compared to the two-step approach of DIP-S, the DIP-I scheme uses a neural network to directly output the precoding matrix and uses the matrix obtained from perfect CSI precoding as the label. This integrates the feedback and beamforming processes, achieving better performance through global system optimization. The integrated method can better utilize feedback information and adjust it according to label information to generate superior precoding matrices. Additionally, the computational and time complexity of this scheme is lower than that of the step-by-step feedback beamforming scheme. Therefore, the DIP-I scheme proposed in this paper demonstrates superior performance in XL-MIMO scenarios.
To further evaluate the effectiveness of the proposed DIP-I scheme, we first provide a comprehensive comparison with existing representative literature in Table 4. Unlike traditional CSI feedback frameworks such as CsiNet [14] or Chen et al. [17], which are primarily designed for far-field stationary channels, our proposed method specifically addresses the unique electromagnetic characteristics of 6G XL-MIMO [33,34].
Table 4.
Comparison of Proposed Scheme with Existing Literature.
As illustrated in Table 3, the proposed DIP-I distinguishes itself in three key dimensions:
- (1)
- Hybrid Domain Awareness: While prior works like Wu et al. [26] focus on near-field effects, our scheme simultaneously captures both spherical wavefront (near-field) and spatial non-stationarity, providing a more robust channel representation in realistic XL-MIMO deployments.
- (2)
- Ultra-Low Feedback Overhead: By leveraging the physics-driven SFCB module to prune redundant spatial-frequency features before the encoding stage, our scheme achieves an ‘Ultra-Low’ feedback overhead, outperforming the compression efficiency of vanilla autoencoders.
- (3)
- End-to-End (E2E) Efficiency: Unlike the conventional ‘reconstruct-then-precode’ paradigm seen in Refs. [8,26,28], DIP-I integrates feedback compression and precoding matrix generation into a single mapping process. This E2E design not only bypasses the accumulation of reconstruction errors but also significantly reduces the computational latency at the Base Station (BS), facilitating real-time beamforming in high-dimensional antenna systems.
6. Conclusions
This paper addressed the critical challenges of CSI sensing, feedback, and beamforming in 6G XL-MIMO systems. We proposed a synergistic architecture combining physics-based dimensionality reduction (SFCB) with end-to-end deep learning (DIP-I). The SFCB effectively exploits the angle-delay domain sparsity of near-field channels to reduce sensing processing overhead, while the DIP-I network learns a direct mapping to the optimal precoder, bypassing the latency-heavy reconstruction-SVD pipeline. Simulation results on COST 2100 and QuaDRiGa datasets confirm that our approach achieves state-of-the-art beamforming accuracy with significantly lower computational complexity. This solution is particularly promising for delay-sensitive 6G ISAC applications where hardware resources are constrained. Future Work and Trends: Integrating semantic communication with CSI feedback by incorporating Semantic Communications (SemCom) into the CSI feedback process and shifting from “reconstructing the original bits” to “conveying core semantic metrics” represents a significant direction for breaking through the Shannon limit and achieving ultra-low overhead feedback [35,36].
Author Contributions
Conceptualization: Y.W. and X.Z.; Methodology: Y.W.; Software: Y.W.; Validation: Y.W., X.Z. and X.X.; Formal analysis: Y.W.; Investigation: Y.W.; Resources: X.Z.; Data curation: Y.W.; Writing—original draft preparation: Y.W.; Writing—review and editing: Y.W., X.Z. and X.X.; Visualization: Y.W.; Supervision: X.Z.; Project administration: X.Z.; Funding acquisition: X.Z. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Science Foundation of China (NSFC) under Grant No. U21A20448.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Acknowledgments
The authors thank the anonymous reviewers for their insightful suggestions and comments.
Conflicts of Interest
The authors declare no conflicts of interest.
References
- Björnson, E.; Sanguinetti, L.; Wymeersch, H.; Hoydis, J.; Marzetta, T.L. Massive MIMO is a reality—What is next? Five promising research directions for antenna arrays. Digit. Signal Process. 2019, 94, 3–20. [Google Scholar] [CrossRef]
- De Carvalho, E.; Ali, A.; Amiri, A.; Angjelichinoski, M.; Heath, R.W. Non-stationarities in extra-large-scale massive MIMO. IEEE Wirel. Commun. 2020, 27, 74–80. [Google Scholar] [CrossRef]
- Cui, M.; Wu, Z.; Lu, Y.; Wei, X.; Dai, L. Near-field communications for 6G: Fundamentals, challenges, potentials, and future directions. IEEE Commun. Mag. 2023, 61, 40–46. [Google Scholar] [CrossRef]
- Sun, P.; Dai, H.; Wang, B. Integrated sensing and secure communication with XL-MIMO. Sensors 2024, 24, 295. [Google Scholar] [CrossRef]
- Ma, R.; Ma, Y.; Lin, Z. Covert communication assisted by movable time-modulated arrays. IEEE Commun. Lett. 2025, 30, 382–386. [Google Scholar] [CrossRef]
- Liu, X.; Zhang, H.; Xu, J. Self-powered absorptive reconfigurable intelligent surfaces for securing satellite-terrestrial integrated networks. China Commun. 2024, 21, 55–69. [Google Scholar] [CrossRef]
- Lu, H.; Zeng, Y. Communicating with Extremely Large-Scale Array/Surface: Unified Modeling and Performance Analysis. IEEE Trans. Wirel. Commun. 2022, 21, 4039–4053. [Google Scholar] [CrossRef]
- Cui, T.J.; Liu, S. Information metamaterials and metasurfaces. J. Mater. Chem. C 2021, 9, 7616–7630. [Google Scholar] [CrossRef]
- Wang, X.; Chen, H.; Jiang, R. Near-field wideband non-stationary channel estimation for XL-MIMO based on frequency-dependent visibility region. IEEE Trans. Commun. Netw. 2025, 11, 3987–4001. [Google Scholar] [CrossRef]
- Wu, Z.; Dai, L. Multiple access for near-field communications: SDMA or LDMA? IEEE J. Sel. Areas Commun. 2023, 41, 2818–2831. [Google Scholar] [CrossRef]
- Lei, H.; Zhang, J.; Wang, Z. Near-field user localization and channel estimation for XL-MIMO systems: Fundamentals, recent advances, and outlooks. IEEE Wirel. Commun. 2025, 32, 190–198. [Google Scholar] [CrossRef]
- Marzetta, T.L. Noncooperative cellular wireless with unlimited numbers of base station antennas. IEEE Trans. Wirel. Commun. 2010, 9, 3590–3600. [Google Scholar] [CrossRef]
- Wen, C.K.; Shih, W.T.; Jin, S. Deep learning for massive MIMO CSI feedback. IEEE Wirel. Commun. Lett. 2018, 7, 748–751. [Google Scholar] [CrossRef]
- Liu, Z.; Zhang, L.; Ding, Z. An Efficient Deep Learning Framework for Low Rate Massive MIMO CSI Reporting. IEEE Trans. Commun. 2020, 68, 4761–4772. [Google Scholar] [CrossRef]
- Wang, T.; Wen, C.K.; Wang, H.; Gao, F.; Jiang, T.; Jin, S. Deep learning for wireless physical layer: Opportunities and challenges. China Commun. 2017, 14, 92–111. [Google Scholar] [CrossRef]
- Guo, J.; Wen, C.K.; Jin, S.; Li, G.Y. Convolutional neural network-based multiple-rate compressive sensing for massive MIMO CSI feedback: Design, simulation, and analysis. IEEE Trans. Wirel. Commun. 2020, 19, 2827–2840. [Google Scholar] [CrossRef]
- Hua, H.; Xu, J.; Han, T.X. Optimal transmit beamforming for integrated sensing and communication. IEEE Trans. Veh. Technol. 2023, 72, 10588–10603. [Google Scholar] [CrossRef]
- Peng, Z.; Liu, R.; Li, Z.; Pan, C.; Wang, J. Deep learning-based CSI feedback for XL-MIMO systems in the near-field domain. IEEE Wirel. Commun. Lett. 2024, 13, 3613–3617. [Google Scholar] [CrossRef]
- Zhang, X.; Wang, Z.; Zhang, H.; Yang, L. Near-field channel estimation for extremely large-scale array communications: A model-based deep learning approach. IEEE Commun. Lett. 2023, 27, 1155–1159. [Google Scholar] [CrossRef]
- Iqbal, A.; Al-Habashna, A.; Wainer, G.; Boudreau, G. Machine learning in near-field communication for 6G: A survey. arXiv 2025, arXiv:2509.16723. [Google Scholar] [CrossRef]
- Gao, Y.; Lu, Z.; Wu, X.; Yu, W.; Liu, S.; Du, J.; Jin, Y.; Zhang, S.; Chu, X.; Xu, S.; et al. AI-driven channel state information (CSI) extrapolation for 6G: Current situations, challenges and future research. IEEE Commun. Surv. Tutor. 2026, in press. [Google Scholar] [CrossRef]
- Yu, W.; He, H.; Song, S.; Zhang, J.; Dai, L.; Zheng, L.; Letaief, K.B. AI and deep learning for terahertz ultra-massive MIMO: From model-driven approaches to foundation models. Engineering 2026, 56, 14. [Google Scholar] [CrossRef]
- Xia, L.; Yang, D.; Zhang, J.; Yang, H.; Chen, J. Enhanced semantic information transfer of multi-domain samples: An adversarial edge detection method using few high-resolution remote sensing images. Sensors 2022, 22, 5678. [Google Scholar] [CrossRef] [PubMed]
- Cui, M.; Dai, L. Channel estimation for extremely large-scale MIMO: Far-field or near-field? IEEE Trans. Commun. 2022, 70, 2663–2677. [Google Scholar] [CrossRef]
- Liu, S.; Yu, X.; Gao, Z.; Xu, J.; Ng, D.W.K.; Cui, S. Sensing-enhanced channel estimation for near-field XL-MIMO systems. IEEE J. Sel. Areas Commun. 2025, 43, 628–643. [Google Scholar] [CrossRef]
- Jaeckel, S.; Raschkowski, L.; Börner, K.; Thiele, L. QuaDRiGa: A 3-D multi-cell channel model with time evolution for enabling virtual field trials. IEEE Trans. Antennas Propag. 2014, 62, 3242–3256. [Google Scholar] [CrossRef]
- Zhang, R.; Cheng, L.; Wang, S.; Lou, Y.; Gao, Y.; Wu, W.; Ng, D.W.K. Integrated sensing and communication with massive MIMO: A unified tensor approach for channel and target parameter estimation. IEEE Trans. Wirel. Commun. 2024, 23, 8571–8587. [Google Scholar] [CrossRef]
- Liu, L.; Oestges, C.; Poutanen, J.; Haneda, K.; Vainikainen, P.; Quitin, F.; Tufvesson, F.; De Doncker, P. The COST 2100 MIMO channel model. IEEE Wirel. Commun. 2012, 19, 92–99. [Google Scholar] [CrossRef]
- Li, X.; Alkhateeb, A. Deep learning for direct hybrid precoding in millimeter wave massive MIMO systems. In Proceedings of the 53rd Asilomar Conference on Signals, Systems, and Computers, Pacific Grove, CA, USA, 3–6 November 2019; pp. 800–805. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar] [CrossRef]
- Guo, J.; Wen, C.K.; Jin, S.; Li, G.Y. Overview of deep learning-based CSI feedback in massive MIMO systems. IEEE Trans. Commun. 2022, 70, 8017–8045. [Google Scholar] [CrossRef]
- Van Huynh, N.; Wang, J.; Du, H.; Li, G.Y. Generative AI for physical layer communications: A survey. IEEE Trans. Cogn. Commun. Netw. 2024, 10, 706–728. [Google Scholar] [CrossRef]
- Zhu, K.; Pan, C.; Ren, H.; Chai, K.K.; Wang, C.X.; Schober, R.; You, X. Performance analysis and low-complexity design for XL-MIMO with near-field spatial non-stationarities. IEEE J. Sel. Areas Commun. 2024, 42, 1656–1672. [Google Scholar] [CrossRef]
- Ozpoyraz, B.; Dogukan, A.T.; Gevez, Y.; Altun, U.; Basar, E. Deep learning-aided 6G wireless networks: A comprehensive survey of revolutionary PHY architectures. IEEE Open J. Commun. Soc. 2022, 3, 1749–1809. [Google Scholar] [CrossRef]
- Lu, H.; Zeng, Y.; You, C.; Han, Y.; Zhang, J.; Wang, Z.; Dong, Z.; Jin, S.; Wang, C.X.; Jiang, T.; et al. A tutorial on near-field XL-MIMO communications toward 6G. IEEE Commun. Surv. Tutor. 2024, 26, 2213–2257. [Google Scholar] [CrossRef]
- Alkhateeb, A. DeepMIMO: A generic deep learning dataset for millimeter wave and massive MIMO applications. In Proceedings of the Information Theory and Applications Workshop (ITA), San Diego, CA, USA, 10–15 February 2019; pp. 1–8. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2026 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license.










