Next Article in Journal
SAM-Guided Concrete Bridge Damage Segmentation with Mamba–ResNet Hierarchical Fusion Network
Previous Article in Journal
Model Predictive Voltage Control Strategy for Dual Active Bridge Converters Based on Super-Twisting Integral Sliding Mode Observer
Previous Article in Special Issue
A Light Source Authentication Algorithm Based on the Delay and Sum of the Light Source Emission Sequence
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Flexible Resource Optimization for D2D XL-MIMO Communication via Adversarial Multi-Armed Bandit

1
State Grid Xinjiang Electric Power Co., Ltd., Urumqi 848000, China
2
College of Electronic and Optical Engineering, Nanjing University of Posts and Telecommuncations, Nanjing 210023, China
*
Author to whom correspondence should be addressed.
Electronics 2025, 14(8), 1498; https://doi.org/10.3390/electronics14081498
Submission received: 18 February 2025 / Revised: 2 April 2025 / Accepted: 6 April 2025 / Published: 8 April 2025
(This article belongs to the Special Issue Security and Privacy in AI and Large Model-Driven 6G Networks)

Abstract

:
Extremely large-scale multi-input and multi-output (XL-MIMO) communication, compared to conventional massive multi-input multi-output communication, can support more users and higher data throughput, thereby significantly improving its spectral efficiency and spatial multiplexing capabilities. This paper investigates the optimization of resource allocation for device-to-device (D2D) multicast communication in XL-MIMO cellular networks. The “many-to-many” sharing model permits one subcarrier to be shared among multiple D2D groups (DGs) and each DG to reuse multiple subcarriers. The objective is to maximize the total multicast data rate of DGs while meeting the data rate requirements of cellular users. This optimization problem is formulated as a 0–1 mixed-integer nonlinear programming problem, with the challenge lying in the fact that adjusting the subcarriers and the power of the user equipment alters the network’s carrier occupation and interference relationships, thereby increasing computational complexity. To address this challenge, a phased strategy is proposed. Initially, subcarrier allocation and coarse power allocation are conducted for cellular users. Subsequently, an adversarial multi-player multi-armed bandit framework is employed, treating DGs as players and subcarrier and power combinations as arms, to maximize the total multicast data rate. An improved Exp3 algorithm is utilized for selecting the optimal combination of arms. Finally, precise power allocation for cellular users is conducted based on the allocation results of the DGs. A comparative analysis of various simulations confirms the superiority of our algorithm over the established heuristic subcarrier assignment and proposed power allocation (HSAPP) and the channel allocation scheme using full information of device locations (CAFIL) approaches.

1. Introduction

In the emerging era of sixth-generation (6G) mobile communication technology, the fast expansion of intelligent devices and the widespread adoption of mobile applications are giving rise to a vast amount of diverse content [1,2]. This requires wireless communication systems to provide higher data rates and connection densities [3]. In the exploration of 6G technology, the extremely large scale multi-input and multi-output (XL-MIMO) system has garnered considerable focus, primarily because of its capacity to significantly boost both spectral and energy efficiency [4]. This technology achieves precise signal control and interference management by deploying a high density of antennas at the base station (BS) and utilizing spatial multiplexing among users. This supports more user connections while improving the overall network throughput and maintaining signal quality [5,6]. Meanwhile, the demands of 6G are also driving the development of group-oriented services (namely, multicast and broadcast communication). Multicast communication takes advantage of the broadcast characteristics of wireless transmission, which can reduce data traffic pressure while simultaneously fulfilling the needs of multiple requesters [7].
When a group of users located in close proximity request the same content, conventional Device-to-Device (D2D) communication may result in redundant transmission by the BS, excessive signaling overhead, and bandwidth waste [8]. Therefore, multicast D2D (MD2D), which combines multicast technology and D2D communication, has been proposed [9]. It enables the extensive parallel processing of local content in high-density scenarios while facilitating a vast number of connections. In MD2D communication, resource management and scheduling strategies are primarily centered around two forms of multicast transmission: single-rate and multi-rate transmission [10]. The advantage of single-rate transmission lies in its simplification of resource management and scheduling strategies, ensuring that all users receive the same quality of service (QoS) and thereby improving the fairness and reliability of the system [11]. This paper takes single-rate MD2D XL-MIMO communication into account. In this case, the data rate achievable for each D2D group (DG) is restricted by the worst signal-to-interference-plus-noise ratio (SINR) condition of the receivers.
In MD2D communication systems, resource allocation models are categorized into three paradigms: one-to-one (exclusive subcarrier assignment with limited capacity [12,13,14]); one-to-many (single-subcarrier multi-DG sharing for improved spectral efficiency [15,16]); and many-to-many (M2M). The M2M model fundamentally outperforms its predecessors by enabling bidirectional subcarrier–DG multiplexing, where multiple subcarriers are dynamically shared among multiple DGs. This architecture achieves significant gains in spectral efficiency and resource flexibility while maintaining cellular user (CU) QoS. Advanced techniques such as a hybrid genetic algorithm for joint subcarrier power optimization [17] and a decentralized resource management framework [18] further enhance network throughput, guarantee minimum rate requirements, and ensure interference coordination, establishing M2M as a systematically optimized solution for high-density heterogeneous networks.
However, in refs. [17,18], the cellular BS only considers the use of MIMO or massive MIMO technologies. XL-MIMO expands the number of antennas and their deployment density comparted to that of massive MIMO to address future communication challenges. The near-field and far-field channel regimes exhibit fundamental distinctions in their wave propagation characteristics. In conventional massive MIMO systems operating under far-field assumptions, spatial stationarity and uniform plane wave propagation enable simplified channel modeling through angular domain approximations [19]. Conversely, XL-MIMO architectures with ultra-large apertures or high-frequency deployments enter the near-field regime, where spherical wavefronts introduce non-uniform phase variations across array elements and break spatial stationarity assumptions [20,21,22]. Currently, research on resource allocation for MD2D XL-MIMO communication is relatively scarce. The authors in [23] studied the joint subcarrier and power allocation problem in MD2D XL-MIMO communication, aiming to maximize the total multicast data rate of all DGs while ensuring QoS for the CUs, and proposed a heuristic subcarrier allocation algorithm and a power allocation algorithm based on differential concave functions, decomposing the original problem into two sub-problems and solving them separately. The aforementioned studies related to MD2D communication all assume the channel to be quasi-static. In practical applications, however, the dynamic changes in the position of user devices cause continuous variations in the channel information, which raise the requirements for the algorithm’s adaptability.
To accommodate dynamic variations in the environment, a feasible approach is to extract and exploit information from historical data for resource allocation. Fortunately, such an idea can be realized through reinforcement learning (RL). As a classic problem in RL, the multi-armed bandit (MAB) problem is characterized by its particular focus on balancing exploration and exploitation to maximize cumulative rewards over time. By employing classical strategies from the MAB framework, a communication network can adapt to changes in the environment effectively [24]. Recently, existing works have applied MAB strategies to 6G communication networks to efficiently utilize limited resources and provide a high quality for users [25]. Specifically, the authors in [26] proposed MAB-aided mode selection and addressed the challenge of adaptive relaying mode selection in 6G cooperative networks. To guarantee QoS for video streaming in highly dynamic mobile environments in heterogeneous 6G networks, the authors in [27] developed a QoS-driven contextual MAB framework integrated with multi-path quick user datagram protocol internet connection. On the other hand, the authors in [28] utilize adversarial MAB (AMAB) to cope with non-stationary environments during communication to improve network performance. Moreover, in [29], the authors introduced a “softwarization of intelligence” framework into 6G networks, which leverages MAB to dynamically select optimal policies tailored to real-time network states, and applied it to neighbor discovery and selection in D2D networks. In D2D network applications, classical MAB strategies have also been employed, with the aim of mitigating the impact of interference through efficient resource allocation and power control. Specifically, the authors in [30] utilize classical strategies, including the upper confidence bound and minimax optimal stochastic strategy, to design a neighbor discovery and selection algorithm in D2D networks, aiming to improve the average system throughput. In [31], the authors utilized a combinatorial MAB strategy to develop an integrated algorithm for mode selection and resource allocation in D2D networks. Compared to the aforementioned strategies, AMAB, as an unsupervised adaptive learning strategy, can employ dynamic adjustment methods to address changes and adversarial factors in the environment. It is more suitable for scenarios requiring real-time decision-making due to unpredictable environments. Consequently, AMAB can effectively handle channel variations and mobility among users and devices in D2D networks. The main works related to this strategy and their characteristics and configurations are listed and compared with those of the present paper in Table 1.
In light of the aforementioned issues, this paper considers optimizing the multicast data rate for all DGs, from the perspective of the entire cellular system, through subcarrier and power allocation, without affecting the QoS of CUs. However, adjusting the subcarriers and power for each user equipment (CU and DG) changes the carrier occupation relationships and interference relations within the network, making the direct solution to this optimization problem highly complex. To efficiently address this issue, this paper introduces an adversarial multi-player multi-armed bandit (MP-MAB) framework. The main contributions of this paper are as follows: (1) It establishes a resource allocation model for MD2D XL-MIMO communication. The goal is to optimize the aggregate multicast data rate across all DGs while adhering to the limitations on the minimum required data rate for CUs, the upper bound of transmission power, and the subcarrier occupation relationship between CUs and subcarriers. The formulated optimization problem is a 0–1 mixed-integer nonlinear programming (MINLP) problem, and it presents significant challenges for analytical solutions. (2) We model the original problem using the adversarial MP-MAB framework, where DGs are treated as players and the combinations of subcarriers and power for DGs and CUs are treated as arms. Since the arm space of this MAB problem is a four-dimensional (4D) space, with each arm having four variable parameters, exhaustively exploring all possible parameter combinations becomes impractical. Therefore, this study proposes a staged strategy to reduce the problem’s complexity. In the first stage, subcarrier allocation and rough power allocation for CUs are performed. Then, the adversarial MP-MAB framework is applied to solve the resource allocation problem for DGs, simplifying the arm space to two-dimensional (2D) space. In the second stage, based on the results of the D2D group allocation, precise power allocation for CUs is carried out. (3) Our simulation results demonstrate that the suggested algorithm performs effective subcarrier and power allocation. Comparisons with the HSAPP and CAFIL algorithms validate the effectiveness of the proposed approach.

2. System Model and Problem Formulation

Consider an MD2D communication system within a cellular network, consisting of a BS employing an extremely large-scale antenna array (with N elements), K single-antenna CUs, and D DGs, as illustrated in Figure 1. The commonly used boundary for distinguishing between the near-field and far-field regions is the Rayleigh distance, i.e., Z = Q 2 / λ , where λ is the carrier wavelength, Q = N ϖ is the aperture of the antenna array, and ϖ is the element spacing. For extremely large-scale antenna arrays, the Rayleigh distance may extend to several hundred meters. Therefore, this paper assumes that all user devices (CUs and DGs) are located in the near-field region. The system parameters are shown in Table 2.
Assume the system has a total of C available orthogonal subcarriers, denoted by the index set C = 1 , 2 , , c , , C . Similarly, let K = 1 , 2 , , k , , K and D = 1 , 2 , , d , , D represent the index sets for the CUs and DGs, respectively. For each DG d, its associated D2D receivers (DRs) are indexed by D d = 1 , 2 , , m , M , where D d denotes the number of DRs in the group. In the proposed MD2D framework, a D2D transmitter (DT) serves multiple DRs within the same group, while each DR belongs exclusively to one DG. This implies mutual exclusivity between groups: D d D d = . Given the spatial separation between DGs, each DG is uniquely served by a single DT. Furthermore, the spatial dynamics of the system ensure that DRs remain within the coverage of their assigned DT over time. Consequently, each DT and its associated DRs form a circular cluster.
The near-field channel between CU k and BS h k , 0     N × 1 can be expressed as follows [32]:
h k , 0 = s = 1 S β k , 0 , s a L k , 0 , s , θ k , 0 , s ,
where S represents the total number of paths from CU k to the BS. β k , 0 , s , L k , 0 , s , and θ k , 0 , s denote the path gain, distance, and angle associated with path s of CU k, respectively. The steering vector a L k , 0 , s , θ k , 0 , s is expressed as follows:
a L k , 0 , s , θ k , 0 , s = 1 N e j 2 π λ L k , 0 , s 0 L k , 0 , s , , e j 2 π λ L k , 0 , s N 1 L k , 0 , s T
where L k , 0 , s n = L k , 0 , s 2 + n ϖ 2 2 n d L k , 0 , s sin θ k , 0 , s represents the distance between the n-th BS antenna and CU k.
In DG d, the channel between DT and DR m can be represented as follows [12]:
h d , m = β d , m L d , m α ,
where β d , m C N 0 , 1 represents the path gain associated with small-scale fading, L d , m represents the distance from DT to DR m, and α is the path loss exponent.
In an XL-MIMO system, the optimal precoding scheme can be designed via maximum ratio combining, which exploits the asymptotic orthogonality of channel vectors under the law of large numbers [33]. Specifically, the precoding vector is given by f = h H . Consequently, the equivalent channel gain from CU k to the BS on subcarrier c can be represented as follows:
ψ k , 0 c = h k , 0 c f k , 0 c 2 = h k , 0 c 2 2 .
Similarly, the channel gain from the DT of DG d to the BS on subcarrier c is represented as ψ d , 0 c , the channel gain from CU k to the DR m of DG d on subcarrier c is represented as ψ k , m c , and the intra-group channel gain from the DT to the DR m within DG d on subcarrier c is represented as ψ d , m c .
Therefore, the received SINR on subcarrier c for CU k and DR m of DG d are, respectively, expressed as follows:
γ k c = p k c ψ k , 0 c Γ k c + σ 0 2 ,
γ d , m c = p d c ψ d , m c Γ d , m c + σ 0 2 .
The aggregate interference components are defined as Γ k c = d D ρ d c p d c ψ d , 0 c (the interference from DGs) and Γ d , m c = ρ k c p k c ψ k , m c + d D , d d ρ d c p d c ψ d , m c (the interference from the CU and other DGs), where ρ k c , ρ d c { 0 , 1 } are binary subcarrier allocation indicators for CU k and DG d, respectively, with p k c and p d c denoting their allocated transmit power on subcarrier c. The term σ 0 2 represents the additive white Gaussian noise variance.
To ensure fairness in resource allocation, the achievable data rate for DG d is computed based on the worst SINR among its receivers [13]. The worst received SINR for DG d on subcarrier c is given by
γ d c = min m D d γ d , m c .
Then, the achievable data rate for CU k and DG d on subcarrier c are, respectively, formulated as
R k c = log 2 1 + γ k c ,
R d c = D d log 2 1 + γ d c .
Hence, the total achievable rates for CU k and DG d are obtained by summing over all allocated subcarriers, i.e., R k = c C ρ k c R k c and R d = c C ρ d c R d c .
The objective of this paper is to maximize the total multicast data rate of all DGs by jointly optimizing the subcarrier and power allocation while ensuring a minimum QoS of the CUs. Specifically, the optimization problem is formulated as
max ρ , p d D R d s . t . C 1 : R k R req , k K C 2 : c C ρ k c p k c p k max , k K C 3 : c C ρ d c p d c p d max , d D C 4 : k K ρ k c 1 , c C C 5 : p k c , p d c 0 , k K , d D , c C C 6 : ρ k c , ρ d c 0 , 1 , k K , d D , c C
where R req is the required transmission data rate for the CUs. p k max and p d max are the maximum transmission powers for the CUs and DGs, respectively. The constraint C 1 ensures the QoS of the CU, while the constraints C 2 , C 3 , and C 5 are power limitations for the CUs and DGs. The constraint C 4 stipulates that each subcarrier can be assigned to at most one CU, which corresponds to the definition in Equation (5). This optimization problem contains both 0–1 binary variables and continuous variables, making it difficult to solve directly.

3. Joint Allocation Algorithm Based on Improved EXP3

The optimization problem in Equation (10) requires joint power and subcarrier allocation for all DGs and CUs, which makes it a 0–1 MINLP problem [34]. To address this combinatorial complexity, we propose an adversarial MP-MAB framework with dimensionality reduction. Given the prohibitive exploration cost in the original 4D arm space, we implement a two-phase decomposition: In the first phase, subcarrier allocation and coarse power allocation are performed for the CUs, and then a joint subcarrier–power allocation for DGs is executed via an improved Exp3 algorithm that reduces the arm space to 2D. In the second phase, precise power allocation is carried out for the CUs. The flowchart of the proposed scheme is shown in Figure 2.

3.1. Problem Reformulation

The optimization problem in Equation (10) is viewed as an MP-MAB problem with the variable sets C d , P d , C k , and P k , where C d and P d are the candidate subcarrier set and power set for the DGs, while C k and P k are the candidate subcarrier set and power set for the CUs. In this MP-MAB problem, each DG d acts as an independent player whose action space A d is defined as the Cartesian product C d P d C k P k , i.e., A d = a d , i i = 1 A d , where each arm a d , i represents a unique combination of subcarrier–power allocations. We discretize the time axis into slots with a time buffer T, during which user devices are assumed to be quasi-static (low-mobility). In each slot, players simultaneously select arms and observe the stochastic rewards determined by their choices.
The reward of DG d is defined as its normalized data rate, which can be expressed as
r d R d R d * ,
where R d * = max i { 1 , 2 , , A d } R d a d , i denotes the maximum observed rate across all possible arm selections. The cumulative total reward of DG d over T slots is defined as
r d sum t = 1 T i A a d t r d , i t ,
with A a d t i : A d , i t 0 , 1 i A d representing the index set of the arms selected by DG d at slot t, where A d , i t { 0 , 1 } indicates the arm’s selection status. Each self-interested player (DG) aims to maximize r d sum through individual arm selection. However, reviewing the form of the data rate R d of DG d, a player’s choice of the subcarrier–power combination that maximizes their own R d may lead to a reduction in the R d of other players. This competitive interaction establishes an adversarial MP-MAB formulation. Given that the joint arm space A d = C d P d C k P k grows exponentially with the number of subcarriers and the power discretization’s granularity, full exploration becomes computationally prohibitive within a finite time buffer. To efficiently solve this MAB problem and reduce its complexity, a phased strategy with a dimension reduction mechanism must be developed.

3.2. Phase One: Resource Allocation for DGs

For the purpose of allocating subcarriers to CUs and DGs, an equal power allocation scheme is first applied to the CUs; that is, for k K , p k c = p k m a x / C . Following power initialization, subcarriers are assigned to the CUs through an iterative gap-reduction procedure. The rate deficiency metric for CU k is defined as
Δ R k = R req R k .
During each iteration, the CU with the maximal Δ R k prioritizes subcarrier selection to minimize its rate deficit. This mechanism ensures the progressive satisfaction of all CUs’ QoS constraints ( C 1 ). The complete procedure is formalized in Algorithm 1.   
Algorithm 1: Subcarrier allocation algorithm for CUs.
Electronics 14 01498 i001
Therefore, after the iteration process is completed, the QoS of all CUs is guaranteed and the constraints C 1 , C 2 , and C 4 are also adhered to. Thus, the original problem is transformed into
max ρ , p d D R d s . t . C 3 : c C ρ d c p d c p d max , d D C 5 : p d c 0 , d D , c C C 6 : ρ d c 0 , 1 , d D , c C .
Next, we introduce the MAB framework to solve the optimization problem in Equation (14). The difference between this and Section 3.1 is that since the subcarriers and power of the CUs have already been allocated, the set of available arms A d re for each player is simply the Cartesian product of the sets C d and P d , denoted as A d re = C d P d . The arm selection strategy employs an improved Exp3 algorithm to solve problem Equation (14) in a distributed manner [35].
First, the proposed algorithm accelerates convergence through prior knowledge integration. As indicated by Equation (18), the updated weight w d , i t + 1 depends on both the estimated reward r ^ d , i t and the current weight w d , i t . Additionally, the empirical probability mass function (PMF) of DG d, namely
u d , i t = 1 α z w d , i t W d ( t ) + α z A d re ,
is a function of the current weights w d , i ( t ) and the exploration–exploitation factor α z , where W d ( t ) = j = 1 | A d re | w d , j ( t ) represents the total arm weights. An adaptive α z schedule is designed to balance exploration breadth and historical knowledge utilization: the initial stages employ larger α z values to ensure sufficient arm exploration, followed by gradual decay to prioritize high-reward arm exploitation. The parameter is configured as follows:
α z = min 1 , A d re ln A d re e 1 ϑ z .
By rationally setting the value of ϑ z , the empirical PMF can converge to the actual PMF. This adaptive mechanism enables rapid convergence to optimal subcarrier–power combinations. The complete distributed allocation procedure is formalized in Algorithm 2.
Algorithm 2 presents a method for finding a near-optimal solution to the above optimization problem. The algorithm starts by obtaining the initial weight of each arm in A d re . For this algorithm, we have no access to prior knowledge, so the initial weight of each arm is set to 1, at which point each arm has the same selection probability. The exact value of ϑ z is given in line 3. Substituting ϑ z into Equation (16) gives α z = 2 z , from which it can be observed that α z decays exponentially. When z reaches a sufficiently high value, the latter component in Equation (15) approaches 0.
Starting from the internal loop in line 5, the selection probability of each arm is first calculated according to Equation (15), and then ς t d arms are selected from A d re based on their probability. Since the algorithm can only observe the rewards for the arms selected in each round, it cannot directly observe the rewards of other arms. To better evaluate the potential value of each arm, the estimated reward is computed in line 9, which helps the algorithm take the unobserved rewards into account, thus avoiding premature convergence to suboptimal arms. In addition, updating the weights directly using the observed rewards leads to bias because the reward information for unselected actions is missing. Therefore, weights are updated using estimated rewards, as shown in line 11. Finally, the cumulative estimated rewards of the selected action are updated. The internal loop is repeated until the maximum cumulative reward is greater than the set threshold, and then the algorithm proceeds to the next external loop.
Finally, we briefly analyze the computational complexity of the proposed algorithm (Algorithm 2). The computational complexity of the arm selection strategy for the adversarial MP-MAB formulation mainly depends on the implementation of the arm selection strategy. In its basic implementation, multiple trials are required to estimate the value of each arm. Given that the computational complexity of calculating the selection probability for each arm and the complexity of selecting an arm based on the probability are both O A d re , the complexity of Algorithm 2 is of the first order, i.e., O D A d re T .
Algorithm 2: Joint subcarrier and power allocation algorithm based on improved Exp3 and DG d.
Electronics 14 01498 i002

3.3. Phase Two: Greedy Power Reallocation for CUs

After the DGs complete their subcarrier and power allocation, power is reassigned to the CUs. According to Equation (5) and the constraint C 4 , there is no interference between the CUs, and interference only originates from the DGs.
Based on the power of DG d, which shares subcarriers c with CU k, the lower bound of the power p k c for CU k that satisfies the data rate requirement R req is calculated. Then, power p k c is allocated to CU k and, based on this, the data rate R d for DG d is recalculated.

4. Simulation Results

To assess the efficacy of the joint subcarrier and power allocation algorithm based on improved Exp3 (JSPAA-IExp3) proposed in this paper, its performance is benchmarked against the HSAPP algorithm [23] and the CAFIL algorithm [12] through various simulations. In the following simulation experiments, unless otherwise specified, the relevant simulation parameters are shown in Table 3. We first compare the computational complexity of the different schemes in Table 4, then present the convergence behavior of the proposed algorithm, and finally investigate the relationship between the total data rate of the DGs and several simulation parameters, including the number of DGs, the number of CUs, the number of subcarriers, and the data rate requirements of the CUs.
Figure 3 illustrates the dynamic evolution of the total multicast data rate for the DGs across successive time slots under our proposed algorithm. The observed trend—initial rapid growth followed by asymptotic convergence—stems from the carefully calibrated exploration–exploitation trade-off governed by parameter α z , which systematically reduces random exploration while prioritizing high-reward subcarrier–power combinations as the slot count increases. Convergence here denotes the algorithm’s stabilization toward near-optimal resource configurations. The residual fluctuations arise from two factors: (1) instantaneous variations in near-field spherical wave channel gains due to user mobility, and (2) the adversarial bandit’s inherent responsiveness to interference pattern changes caused by DG reconfigurations.
Figure 4 demonstrates the scalability of the total multicast data rate across varying numbers of DGs in XL-MIMO networks, comparing our proposed algorithm with the HSAPP and CAFIL benchmarks. The monotonic increase in the total data rate for all methods aligns with XL-MIMO’s spatial multiplexing potential, though our approach consistently outperforms the others due to its adversarial multi-arm bandit architecture. This advantage stems from systematically leveraging historical channel information through the improved Exp3 algorithm’s weighted probability updates, enabling the progressive learning of near-optimal subcarrier–power arm pairs (with each arm representing a discrete subcarrier–power pair) across successive slots. Furthermore, the proposed algorithm consistently achieves a high performance by explicitly modeling many-to-many subcarrier reuse conflicts through adversarial rewards (Equation (11)). In contrast, conventional methods independently determine suboptimal joint subcarrier and power allocations per slot, ignoring temporal correlation in 6G’s quasi-stationary near-field channels.
Figure 5 analyzes the impact of CU density on the total multicast data rate of the DGs in XL-MIMO networks, revealing an inverse correlation where increasing the CU count (from 6 to 10 in simulations) reduces the DG rates across all algorithms. This degradation stems from the proportional growth of subcarriers reserved for CU QoS guarantees (constraint C 1 in Equation (10)), constraining the DGs’ spectral reuse opportunities. Our proposed JSPAA-IExp3 algorithm and the HSAPP algorithm mitigate this through their “many-to-many” sharing paradigm, achieving higher DG rates than CAFIL’s restrictive “one-to-one” model by enabling dynamic subcarrier multiplexing across overlapping DGs. Notably, at 10 CUs, our method maintains 80.85 % of the peak DG rate observed with 6 CUs, outperforming HSAPP and CAFIL, demonstrating robust scalability for ultra-dense 6G deployments where CU-DG coexistence is inevitable.
Figure 6 evaluates the scalability of a DG’s multicast performance with respect to the available subcarriers in XL-MIMO networks, demonstrating that increasing the subcarrier count (from 8 to 12 in simulations) enhances total DG data rates across all algorithms. Our proposed JSPAA-IExp3 algorithm achieves 7.9–22.2% higher rates than those of HSAPP and CAFIL at 12 subcarriers, primarily due to its adversarial bandit mechanism, which dynamically optimizes “many-to-many” subcarrier reuse while suppressing inter-group interference through learned power adaptation.
Figure 7 investigates the trade-off between CU QoS constraints and DG multicast performance by varying the minimum data rate requirements of the CUs. The observed inverse relationship—where DG rates decrease as CU demands escalate—stems from the dual impact of resource scarcity and interference escalation: higher CU rates necessitate both exclusive subcarrier allocations and elevated transmit power, which collectively reduce the resources available for DGs while amplifying cross-tier interference. Our proposed algorithm outperforms HSAPP and CAFIL through its two-stage adaptive mechanism: (1) the coarse cellular pre-allocation phase strictly enforces CU QoS via convex power budgeting (Equation (13)) and (2) the adversarial bandit phase dynamically redistributes residual subcarriers among DGs using interference-weighted reward updates (Equation (18)), effectively decoupling CU-DG resource competition.
Figure 8 demonstrates the DG multicast performance scaling trends of the proposed scheme versus those of the conventional HSAPP and CAFIL approaches as the BS antenna count increases from 256 to 512 in XL-MIMO deployments. The results reveal that our algorithm achieves an 8.1–19.4% higher total data rate for the DGs than the benchmarks at 512 antennas, with the performance gap expanding linearly due to its enhanced spatial resolution under near-field spherical wave propagation.

5. Conclusions

This paper investigates the problem of joint subcarrier and power allocation in a D2D XL-MIMO communication system. The goal of this problem is to optimize the aggregate multicast data rate for all DGs while maintaining a good QoS for the CUs. To fully explore the benefits of D2D communication, a more general “M2M” sharing scenario was considered. To solve the proposed 0–1 mixed-integer nonlinear programming problem, a phased strategy was adopted. The joint subcarrier and power allocation for all DGs and CUs in the original problem was decoupled into separate joint power and subcarrier allocation problems for the DGs and CUs. In phase one, an adversarial MP-MAB was introduced to solve the joint subcarrier and power allocation problem for the DGs, and the improved Exp3 algorithm was used to solve the problem over slots. This algorithm operates as a fully decentralized method capable of converging to the optimal pairing of subcarriers and power allocation. In phase two, the power was reallocated to the CUs based on the assignment results for the DGs. The experimental results demonstrate that the newly developed algorithm attains at least a 10 % improvement in aggregated multicast throughput for D2D communications when benchmarked against the state-of-the-art methods documented in recent research.

Author Contributions

Conceptualization, Z.J., C.M. and M.L.; methodology, Z.J., C.M. and Y.S.; software, Z.J., C.M. and H.L.; validation, Z.J., C.M. and H.L.; formal analysis, Z.J. and Y.S.; investigation, Z.J., C.M. and Y.S.; resources, C.M. and M.L.; writing—original draft preparation, Z.J.; writing—review and editing, C.M. and H.L.; supervision, C.M.; project administration, H.L.; funding acquisition, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China under Grant 62101282.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Authors Zhaomin Jian, Chao Ma and Mengshuang Liu were employed by the company State Grid Xinjiang Electric Power Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
HSAPP heuristic subcarrier assignment and proposed power allocation
CAFILchannel allocation scheme using full information of device locations
6Gsixth-generation
XL-MIMOextremely large-scale multi-input and multi-output
BSbase station
D2Ddevice-to-device
MD2Dmulticast device-to-device
QoSquality of service
DGdevice-to-device group
SINRsignal-to-interference-plus-noise ratio
M2M many-to-many
CUcellular user
RL reinforcement learning
MABmulti-armed bandit
AMABadversarial multi-armed bandit
MP-MABmulti-player multi-armed bandit
MINLP mixed-integer nonlinear programming
4D four-dimensional
DR D2D receiver
DT D2D transmitter
PMFprobability mass function

References

  1. Dao, N.N.; Pham, Q.V.; Tu, N.H.; Thanh, T.T.; Bao, V.N.Q.; Lakew, D.S.; Cho, S. Survey on aerial radio access networks: Toward a comprehensive 6G access infrastructure. IEEE Commun. Surv. Tutor. 2021, 23, 1193–1225. [Google Scholar] [CrossRef]
  2. Wang, C.X.; You, X.; Gao, X.; Zhu, X.; Li, Z.; Zhang, C.; Wang, H.; Huang, Y.; Chen, Y.; Haas, H.; et al. On the road to 6G: Visions, requirements, key technologies, and testbeds. IEEE Commun. Surv. Tutor. 2023, 25, 905–974. [Google Scholar] [CrossRef]
  3. Lu, H.; Zeng, Y.; You, C.; Han, Y.; Zhang, J.; Wang, Z.; Dong, Z.; Jin, S.; Wang, C.X.; Jiang, T.; et al. A tutorial on near-field XL-MIMO communications towards 6G. IEEE Commun. Surv. Tutor. 2024, 26, 2213–2257. [Google Scholar] [CrossRef]
  4. Rappaport, T.S.; Xing, Y.; Kanhere, O.; Ju, S.; Madanayake, A.; Mandal, S.; Alkhateeb, A.; Trichopoulos, G.C. Wireless communications and applications above 100 GHz: Opportunities and challenges for 6G and beyond. IEEE Access 2019, 7, 78729–78757. [Google Scholar] [CrossRef]
  5. Wu, Q.; Zhang, S.; Zheng, B.; You, C.; Zhang, R. Intelligent reflecting surface-aided wireless communications: A tutorial. IEEE Trans. Commun. 2021, 69, 3313–3351. [Google Scholar] [CrossRef]
  6. Cui, M.; Wu, Z.; Lu, Y.; Wei, X.; Dai, L. Near-field MIMO communications for 6G: Fundamentals, challenges, potentials, and future directions. IEEE Commun. Mag. 2022, 61, 40–46. [Google Scholar] [CrossRef]
  7. Chukhno, N.; Chukhno, O.; Moltchanov, D.; Gaydamaka, A.; Samuylov, A.; Molinaro, A.; Koucheryavy, Y.; Iera, A.; Araniti, G. The use of machine learning techniques for optimal multicasting in 5G NR systems. IEEE Trans. Broadcast. 2022, 69, 201–214. [Google Scholar] [CrossRef]
  8. Ansari, R.I.; Chrysostomou, C.; Hassan, S.A.; Guizani, M.; Mumtaz, S.; Rodriguez, J.; Rodrigues, J.J. 5G D2D networks: Techniques, challenges, and future prospects. IEEE Syst. J. 2017, 12, 3970–3984. [Google Scholar] [CrossRef]
  9. Chen, Y.; He, S.; Hou, F.; Shi, Z.; Chen, J. An efficient incentive mechanism for device-to-device multicast communication in cellular networks. IEEE Trans. Wirel. Commun. 2018, 17, 7922–7935. [Google Scholar] [CrossRef]
  10. Afolabi, R.O.; Dadlani, A.; Kim, K. Multicast scheduling and resource allocation algorithms for OFDMA-based systems: A survey. IEEE Commun. Surv. Tutor. 2012, 15, 240–254. [Google Scholar] [CrossRef]
  11. Sampath, H.; Talwar, S.; Tellado, J.; Erceg, V.; Paulraj, A. A fourth-generation MIMO-OFDM broadband wireless system: Design, performance, and field trial results. IEEE Commun. Mag. 2002, 40, 143–149. [Google Scholar] [CrossRef]
  12. Kim, J.h.; Joung, J.; Lee, J.W. Resource allocation for multiple device-to-device cluster multicast communications underlay cellular networks. IEEE Commun. Lett. 2017, 22, 412–415. [Google Scholar] [CrossRef]
  13. Bhardwaj, A.; Agnihotri, S. A resource allocation scheme for multiple device-to-device multicasts in cellular networks. In Proceedings of the 2016 IEEE Wireless Communications and Networking Conference, Doha, Qatar, 3–6 April 2016; pp. 1–6. [Google Scholar]
  14. Meshgi, H.; Zhao, D.; Zheng, R. Joint channel and power allocation in underlay multicast device-to-device communications. In Proceedings of the 2015 IEEE International Conference on Communications (ICC), London, UK, 8–12 June 2015; pp. 2937–2942. [Google Scholar]
  15. Gong, W.; Wang, X. Particle swarm optimization based power allocation schemes of device-to-device multicast communication. Wireless Pers. Commun. 2015, 85, 1261–1277. [Google Scholar] [CrossRef]
  16. Palla, R.K.; Amudala, D.N.; Budhiraja, R. Analysis of URLLC-Enabled Hardware-Impaired Massive MIMO Relaying With D2D Users. IEEE Trans. Veh. Technol. 2024, 73, 10026–10043. [Google Scholar] [CrossRef]
  17. Hamdi, M.; Zaied, M. Resource allocation based on hybrid genetic algorithm and particle swarm optimization for D2D multicast communications. Appl. Soft Comput. 2019, 83, 105605. [Google Scholar] [CrossRef]
  18. Elnourani, M.; Deshmukh, S.; Beferull-Lozano, B. Distributed Resource Allocation in Underlay Multicast D2D Communications. IEEE Trans. Commun. 2021, 69, 3409–3422. [Google Scholar] [CrossRef]
  19. Xu, W.; An, J.; Li, H.; Gan, L.; Yuen, C. Algorithm-unrolling-based distributed optimization for RIS-assisted cell-free networks. IEEE Internet Things J. 2023, 11, 944–957. [Google Scholar] [CrossRef]
  20. Zhang, H.; Shlezinger, N.; Guidi, F.; Dardari, D.; Imani, M.F.; Eldar, Y.C. Beam Focusing for Near-Field Multiuser MIMO Communications. IEEE Trans. Wirel. Commun. 2022, 21, 7476–7490. [Google Scholar] [CrossRef]
  21. Hu, Z.; Chen, C.; Jin, Y.; Zhou, L.; Wei, Q. Hybrid-field channel estimation for extremely large-scale massive MIMO system. IEEE Commun. Lett. 2022, 27, 303–307. [Google Scholar] [CrossRef]
  22. Zhi, K.; Pan, C.; Ren, H.; Chai, K.K.; Wang, C.X.; Schober, R.; You, X. Performance Analysis and Low-Complexity Design for XL-MIMO with Near-Field Spatial Non-Stationarities. IEEE J. Sel. Areas Commun. 2024, 42, 1656–1672. [Google Scholar] [CrossRef]
  23. Gao, M.; Xu, L.; Huang, W. Optimal Resource Allocation for D2D Multicast Communications for XL-MIMO Systems. IEEE Access 2024, 12, 161519–161529. [Google Scholar] [CrossRef]
  24. De Curtò, J.; de Zarzà, I.; Roig, G.; Cano, J.C.; Manzoni, P.; Calafate, C.T. LLM-informed multi-armed bandit strategies for non-stationary environments. Electronics 2023, 12, 2814. [Google Scholar] [CrossRef]
  25. Hashima, S.; Fouda, M.M.; Hatano, K.; Takimoto, E. Advanced Learning Schemes for Metaverse Applications in B5G/6G Networks. In Proceedings of the 2023 IEEE International Conference on Metaverse Computing, Networking and Applications (MetaCom), Kyoto, Japan, 26–28 June 2023; pp. 799–804. [Google Scholar] [CrossRef]
  26. Nomikos, N.; Charalambous, T.; Trakadas, P.; Wichman, R. Bandit-Based Learning-Aided Full-Duplex/Half-Duplex Mode Selection in 6G Cooperative Relay Networks. IEEE Open J. Commun. Soc. 2024, 5, 1415–1429. [Google Scholar] [CrossRef]
  27. Yang, W.; Cai, L.; Shu, S.; Sepahi, A.; Huang, Z.; Pan, J. QoS-driven Contextual MAB for MPQUIC Supporting Video Streaming in Mobile Networks. IEEE Trans. Mob. Comput. 2024, 24, 3274–3287. [Google Scholar] [CrossRef]
  28. Salah, M.M.; Saad, R.S.; Zaki, R.M.; Rabie, K.; ElHalawany, B.M. Multi-Armed Bandits for Resource Allocation in UAV-Assisted LoRa Networks. IEEE Internet Things Mag. 2025, 8, 40–45. [Google Scholar] [CrossRef]
  29. Hashima, S.; Fadlullah, Z.M.; Fouda, M.M.; Mohamed, E.M.; Hatano, K.; ElHalawany, B.M.; Guizani, M. On softwarization of intelligence in 6G networks for ultra-fast optimal policy selection: Challenges and opportunities. IEEE Netw. 2022, 37, 190–197. [Google Scholar] [CrossRef]
  30. Hashima, S.; ElHalawany, B.M.; Hatano, K.; Wu, K.; Mohamed, E.M. Leveraging machine-learning for D2D communications in 5G/beyond 5G networks. Electronics 2021, 10, 169. [Google Scholar] [CrossRef]
  31. Ortiz, A.; Asadi, A.; Engelhardt, M.; Klein, A.; Hollick, M. CBMoS: Combinatorial bandit learning for mode selection and resource allocation in D2D systems. IEEE J. Sel. Areas Commun. 2019, 37, 2225–2238. [Google Scholar] [CrossRef]
  32. Zhang, W.; Song, Y.; Liu, C.; Huang, Z.; Qian, M. Hybrid-Field Channel Estimation for XL-MIMO: A Proximal Gradient Algorithm on the Fixed-Rank Matrix Manifold. In Proceedings of the ICC 2024-IEEE International Conference on Communications, Denver, CO, USA, 9–13 June 2024; pp. 3122–3127. [Google Scholar]
  33. Qian, M.; Li, C.; Ma, Y.; Song, Y.; Liu, C.; Yin, Z. A Contextual MAB-Based Two-Timescale Scheme for RIS-Assisted Systems. IEEE Wirel. Commun. Lett. 2024, 14, 400–404. [Google Scholar] [CrossRef]
  34. Liang, H.; Liu, C.; Song, Y.; Gao, T.; Zou, Y. Neighbor-based joint spatial division and multiplexing in massive MIMO: User scheduling and dynamic beam allocation. EURASIP J. Adv. Signal Process. 2024, 2024, 1. [Google Scholar] [CrossRef]
  35. Tong, J.; Fu, L.; Han, Z. Throughput enhancement of full-duplex CSMA networks via adversarial multi-player multi-armed bandit. In Proceedings of the 2019 IEEE Global Communications Conference (GLOBECOM), Waikoloa, HI, USA, 9–13 December 2019; pp. 1–6. [Google Scholar]
Figure 1. Schematic diagram of the system studied for the MD2D XL-MIMO communication scenario.
Figure 1. Schematic diagram of the system studied for the MD2D XL-MIMO communication scenario.
Electronics 14 01498 g001
Figure 2. A flowchart of the proposed two-phase resource allocation scheme.
Figure 2. A flowchart of the proposed two-phase resource allocation scheme.
Electronics 14 01498 g002
Figure 3. The relationship between the total data rate of the DGs and the number of slots.
Figure 3. The relationship between the total data rate of the DGs and the number of slots.
Electronics 14 01498 g003
Figure 4. The relationship between the total data rate of the DGs and the number of DGs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Figure 4. The relationship between the total data rate of the DGs and the number of DGs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Electronics 14 01498 g004
Figure 5. The relationship between the total data rate of the DGs and the number of CUs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Figure 5. The relationship between the total data rate of the DGs and the number of CUs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Electronics 14 01498 g005
Figure 6. The relationship between the total data rate of the DGs and the number of subcarriers for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Figure 6. The relationship between the total data rate of the DGs and the number of subcarriers for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Electronics 14 01498 g006
Figure 7. The relationship between the total data rate of the DGs and minimum data rate requirements of the CUs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Figure 7. The relationship between the total data rate of the DGs and minimum data rate requirements of the CUs for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Electronics 14 01498 g007
Figure 8. The relationship between the total data rate of the DGs and the number of BS antennas for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Figure 8. The relationship between the total data rate of the DGs and the number of BS antennas for JSPAA-IExp3, HSAPP [23], and CAFIL [12] schemes.
Electronics 14 01498 g008
Table 1. Summary of studies related to resource allocation in D2D networks.
Table 1. Summary of studies related to resource allocation in D2D networks.
ReferenceApply
XL-MIMO
Support
M2M
Dynamic
Environment
Resource
Allocation
Mode
Selection
Neighbor
Discovery
Apply
RL
[12]
[18]
[23]
[30]
[31]
Present paper
Table 2. System parameters.
Table 2. System parameters.
NotationDefinition
NNumber of BS antenna
KNumber of CUs
DNumber of DGs
MNumber of receiving devices within DG
ϖ Antenna element spacing of BS
ZRayleigh distance
h k , 0 Near-field channel from CU k to BS
h d , m Channel between transmitting device and receiving devices m within DG d
ψ k , 0 c Channel gain from CU k to BS on subcarrier c
ψ d , 0 c Channel gain from the transmitting device in DG d to BS on subcarrier c
ψ k , m c Channel gain from CU k to the receiving devices m in DG d on subcarrier c
ψ d , m c Channel gain from the transmitting device in DG d to receiving devices m on subcarrier c
R req Required transmission rate of CUs
p k m a x Maximum power threshold of CUs
p d m a x Maximum power threshold of DGs
Table 3. Relevant simulation parameters and values.
Table 3. Relevant simulation parameters and values.
ParametersValue
Cellular cell radius R200 m
Maximum radius of DGs20 m
Position of CUs and users of DGsUniformly distributed in the semicircle of 0 , R
Number of CUs K9
Number of DGs D3
Number of receiving devices M within each DG3
Number of BS antennas512
Number of subcarriers C10
Path loss constant α 2.2
Number of multipaths L3
Carrier center frequency30 GHz
Maximum power of CUs p k max 20 dBm
Maximum power of DGs p d max 17 dBm
Data rate requirement of CUs R req 1 bps/Hz
SINR20 dB
Time period T1000 slots
Table 4. Comparison of computational complexity of different schemes.
Table 4. Comparison of computational complexity of different schemes.
AlgorithmComputational Complexity
JSPAA-IExp3 O | D | | A d re | T
HSAPP O ( 2 | C | ( | K | + | D | ) + | K | + | K | f ( D D ) 0.5 3 | C | ( | K | + | D | ) + | K | + | D | f ( D D ) ( | K | + | D | ) 2 | C | 2 )
CAFIL O ( | K | + | D | ( | D d | + 1 ) )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Jian, Z.; Ma, C.; Song, Y.; Liu, M.; Liang, H. Flexible Resource Optimization for D2D XL-MIMO Communication via Adversarial Multi-Armed Bandit. Electronics 2025, 14, 1498. https://doi.org/10.3390/electronics14081498

AMA Style

Jian Z, Ma C, Song Y, Liu M, Liang H. Flexible Resource Optimization for D2D XL-MIMO Communication via Adversarial Multi-Armed Bandit. Electronics. 2025; 14(8):1498. https://doi.org/10.3390/electronics14081498

Chicago/Turabian Style

Jian, Zhaomin, Chao Ma, Yunchao Song, Mengshuang Liu, and Huibin Liang. 2025. "Flexible Resource Optimization for D2D XL-MIMO Communication via Adversarial Multi-Armed Bandit" Electronics 14, no. 8: 1498. https://doi.org/10.3390/electronics14081498

APA Style

Jian, Z., Ma, C., Song, Y., Liu, M., & Liang, H. (2025). Flexible Resource Optimization for D2D XL-MIMO Communication via Adversarial Multi-Armed Bandit. Electronics, 14(8), 1498. https://doi.org/10.3390/electronics14081498

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop