1. Introduction
The rapid advancements in wireless communication are enabling better connectivity and faster data transfers [
1]. As we move from 4G/5G to future-generation networks such as beyond 5G and 6G, there is a growing demand for higher data rates, especially due to the large number of devices in the Internet of Things (IoT) [
2] and sensor networks [
3,
4]. This evolution also requires reduced latency and enhanced spectrum efficiency, which are becoming increasingly significant [
5,
6,
7,
8]. Thus, massive MIMO systems are essential for meeting the growing demands of higher data rates. Massive MIMO effectively accomplishes these growing demands because of hundreds to thousands of antennas at the BS [
9]. This type of arrangement and other benefits provide improved spatial multiplexing, beamforming, and interference reduction [
10]. However, the development and deployment of large systems like massive MIMO is challenging, especially in signal detection [
11], due to the complex characteristics of received signals. High-performance and efficient detection algorithms are needed to maximize their full capabilities while minimizing computational costs [
12]. The challenge of efficient signal detection in large-scale antenna arrays has garnered significant attention in recent research [
13]. The performance of massive MIMO systems is strongly influenced by the complexity of the detection algorithms employed, as these algorithms often require extensive computational resources. Although the maximum-likelihood (ML) detector is recognized for its optimal performance [
14], its computational demands grow exponentially with the number of transmitters [
15], as it involves evaluating all possible transmitted symbols. To address this, simple linear detectors like zero-forcing (ZF) and linear minimum mean square error (MMSE) are commonly used. These detectors offer a good balance between computational efficiency and performance [
16]. In the massive MIMO regime, linear MMSE is particularly appropriate due to its favorable trade-off between complexity and performance [
17]. As the number of antennas increases, it approaches optimal estimation performance due to channel hardening effects, while maintaining relatively low computational complexity [
18]. It also improves signal quality by mitigating the impact of noise and interference due to the large number of antennas [
19]. This makes linear MMSE an excellent choice for practical massive MIMO implementations, offering near-optimal performance. So it makes linear MMSE very easy to use but has limited effectiveness in computational complexity, particularly in situations with heavy system loads [
20]. The reason behind high complexity is the complicated operation of extensive matrix inversion, and this brings great challenges in execution.
  1.1. Related Work
In order to overcome the challenge of complexity in direct inversion techniques, several approximate matrix inversion and iterative methods have been developed. In massive MIMO, a good performance and complexity trade-off can be achieved by utilizing these methods because of higher BS antennas than users [
21]. In [
22,
23], Neumann series (NS) detection has been developed to approximate large matrix inversion. NS achieves low complexity effectively if the BS to users antenna ratio is large (e.g., ≥16). The complexity is reduced from 
 to 
 when 
. But as 
, the complexity is higher even than the direct MMSE solver. Here, 
K represents the number of users. Another method to approximate the matrix inverse such as Newton iteration (NI) and its improved version, diagonal band NI, has been developed, which has faster convergence as compared to NS [
24]. NI was further improved in [
25], which requires fewer iterations to achieve matrix inversion with the required accuracy. However, as the number of uses increases to obtain satisfactory results, its computational complexity also increases. This has led further to the development of more sophisticated algorithms, i.e., iterative algorithms.
The concept behind iterative techniques is the estimation of the transmitted symbol without calculating the inverse of the large matrix. Typically, these methods have significantly lower complexity compared to approximate matrix inversion methods [
26]. So, several iterative signal detectors have been investigated recently. Richardson iteration (RI) [
27,
28,
29] is employed to achieve near-MMSE performance. It requires more iterations to achieve optimal results [
30] and it is greatly affected by the relaxation parameter [
31]. The work in [
32] introduced the joint steepest descent algorithm to achieve optimal performance. However, its slow convergence leads to high computational complexity, making it inefficient for large-scale systems. To achieve better performance in challenging conditions, successive over-relaxation (SOR) [
33,
34] and Gauss–Seidel (GS) [
35,
36,
37,
38] methods have been proposed. These techniques have drawn significant attention due to their ability to attain nearly optimal performance while reducing complexity. However, their sequential iteration process makes parallel implementation difficult. Another promising approach in iterative detection is the AOR method [
39], where the authors presented a comprehensive analysis of the AOR method for massive MIMO systems, highlighting its potential for improved convergence rates compared to traditional methods. However, the performance of the AOR method is highly reliant on selecting its relaxation and acceleration parameters, which presents a challenge for varying system conditions. AOR is further improved in [
40], which utilizes the AOR method and Neumann series to reduce computational complexity. But this method is again highly dependent on the values of acceleration and relaxation parameters. Most of these methods adopt fixed values, which make these methods less accurate in terms of detection accuracy and performance under high antenna configurations and continuously varying channel conditions.
Concurrently, there has been a growing utilization of nature-inspired optimization algorithms in many areas of wireless communication [
41,
42,
43]. Particle Swarm Optimization (PSO), in particular, has shown remarkable versatility in solving complex optimization problems. The work in [
44] demonstrated the effectiveness of PSO in system identification, while [
45] proposed an improved PSO variant with enhanced convergence characteristics. These observations of optimization are essential in optimizing different parameters of over-relaxation methods in signal detection. In summary, most of the detection algorithms possess huge computational complexity, and it increases as the number of antennas is increased. Therefore, achieving the trade-off between accuracy and cost complexity is difficult. This is particularly true in more complex system environments. Moreover, the over-relaxation detectors like AOR, SOR, and GS do not have a systematic way to choose these parameters, which causes performance loss in large antenna systems and different channel conditions [
12]. The existence of these research gaps in massive MIMO systems highlights the necessity for advanced innovative signal detection approaches that can effectively address the selection of optimal parameters by lowering computational complexity for high-dimensional MIMO conditions.
  1.2. Contributions and Outline of the Paper
The primary objective of this work is to optimize the AOR approach by utilizing a novel PSO algorithm. This optimization aims to effectively balance the computational complexity and high detection accuracy in massive MIMO systems. Considering the challenges introduced by the increasing number of antennas, our approach seeks to address the need for an efficient signal detection method that can adapt to various channel conditions and system configurations. To achieve this goal, we propose two algorithms. The first algorithm is the modification of nature-inspired PSO, called novel PSO, which has improved inertial weights and acceleration coefficients. The second main algorithm is OAOR, which is based on the proposed novel PSO. The main contributions can be summarized as follows:
- First, we introduce a PSO algorithm that is a modified variant called novel PSO, which incorporates adaptive acceleration coefficients to improve convergence in the context of MIMO detection. This enhancement contributes to both the efficiency and effectiveness of the optimization process. We individually evaluate the performance of this modified version of PSO for its effectiveness. 
- Next, we propose the OAOR method utilizing the novel PSO that more efficiently handles the varying number of antennas. The scalability of the proposed novel algorithm can successfully address the growing demands of next-generation wireless communication systems. This method efficiently incorporates the adaptive acceleration coefficients and inertia weight, which further refines the optimization process and substantially improves convergence. Novel PSO-incorporated OAOR detectors significantly reduce the number of iterations to achieve the desired detection accuracy and consequently reduce the computational cost. In addition, the sensitivity of conventional AOR to relaxation and acceleration parameters is also more efficiently handled in novel PSO-based detection algorithms. 
- A detailed evaluation of the OAOR method is conducted, focusing on both the computational complexity and error rate performance. The scalability of the OAOR approach is assessed across diverse antenna configurations using extensive simulations. This in-depth analysis highlights the method’s robustness and flexibility, demonstrating its effectiveness in various scenarios. The numerical results reveal that the proposed algorithms outperform existing methods in terms of computational efficiency and symbol error rate (SER). 
The remainder of this paper has been arranged as follows: 
Section 2 provides a detailed system model for a massive MIMO system. 
Section 3 presents the proposed OAOR method, including its detailed derivation, convergence analysis, the modified novel PSO algorithm, and integration with OAOR. 
Section 4 describes the comprehensive analysis of computational complexity. The simulation results of detection performance in terms of SER are presented in 
Section 5. 
Section 6 provides the final observations and discusses future aspects.
  2. System Model
Let us consider an uncoded massive MIMO uplink scenario, where the BS is equipped with 
N number of antennas, communicating with 
K single-antenna users over a Rayleigh fading channel [
46,
47]. Rayleigh fading is commonly used to model wireless communication environments due to its practical relevance in reflecting multipath propagation effects. The received signal at the BS can be mathematically expressed as
      
      where 
 is the received signal vector. The channel matrix between users and the BS is represented as 
. 
 is the vector of the transmitted signal from the user end, and 
 is the additive white Gaussian noise (AWGN) with zero mean and variance 
. The described system model in the form of a matrix can be written as
      
The objective is to recover the transmitted signal vector 
 from the received vector 
. To accomplish this, we employ the linear MMSE criterion, which minimizes the mean square error between the estimated signal vector 
 and the actual transmitted signal vector 
. The estimate of 
 using linear MMSE is given by
      
Here,  denotes the Hermitian transpose of the channel matrix ,  is the identity matrix, and  is a regularization parameter. For simplicity, we will refer to the linear MMSE estimator simply as MMSE in subsequent sections to maintain consistency.
In massive MIMO systems, directly computing the inverse of the matrix 
, as shown in Equation (
3), is computationally prohibitive due to the high dimensionality of the matrices involved. MMSE can obtain near-optimal performance. However, when we employ a large number of antennas on a large scale, then the matrix inversion of 
 becomes very high. To mitigate this computational complexity in detection, iterative methods are often employed.
  4. Analysis of Computational Complexity
The OAOR method’s computational complexity in terms of complex-valued multiplications can be derived by analyzing its two main components, i.e., the iterative detection process and the novel PSO algorithm. To begin, let us consider the OAOR iteration Equation (
6), where 
 is the signal estimate, 
 and 
 are optimized parameters, and 
, and 
 are components of the channel matrix 
. Here, major operations are the two matrix–vector multiplications (
 and 
) and several vector additions and scalar multiplications. The matrix–vector multiplications dominate the complexity, each requiring 
 operations for a system with 
K users. Summing up these operations and accounting for 
i iterations, we derive the iterative part’s complexity as 
, or more precisely, 
 complex multiplications.
As OAOR includes the novel PSO for obtaining the optimized parameters, its fitness function is evaluated to reduce the mean square error. The PSO algorithm operates on a population of 
P particles over 
T iterations. In each detection, for the given particles, the fitness function is evaluated to update the particle’s velocity and position and update the personal and global best solutions. The fitness function, which estimates the mean square error, is the most computationally expensive part. A straightforward implementation of the fitness function would involve operations similar to an OAOR iteration with the complexity of 
. However, we can optimize this here using the sampling technique and approximation, reducing it to 
, keeping in mind that it will be a one-time cost for that particular system’s configuration and channel conditions, and will not impact each iteration. To compute its complexity, we multiply the number of particles (
P), the number of iterations (
T), and the fitness function complexity: 
. This shows that OAOR maintains an overall complexity order of 
, lower than the 
 complexity of MMSE detection for large 
K. To illustrate, consider a system with 
 users. MMSE would require 
 2,097,152 operations, while OAOR with 
 iterations would need 
 65,536 operations for the iterative part, plus the PSO contribution. Even with PSO, the complexity of OAOR remains substantially lower than MMSE for this large system. This means the efficiency of OAOR is more promising as the number of users 
K increases. This point also has more of an advantage than the traditional AOR method and makes OAOR more advantageous in terms of the scalability of large antenna systems. To further validate our proposed method’s complexity, we compare it with the complexities of AOR, IGS, and efficient successive over-relaxation (E-SOR) algorithms. We can evaluate from the performance results that our proposed method has a good balance between complexity and detection accuracy. The comparison of the complexity of these algorithms is given in 
Figure 4. In this figure, although AOR has lower complexity in detection performance, it does not achieve the optimal results of detection accuracy within the same iteration count as the proposed method. Similarly, the complexity of the IGS method is increased for higher iteration counts, (i.e., 
) when the antenna configuration ratio is very small, thus requiring higher iterations to attain near-MMSE performance. So, overall, our proposed method presents an optimal trade-off between complexity and performance.
  5. Simulation Results and Discussion
This section presents the numerical results and performance analysis of the proposed OAOR method. The performance of the OAOR method is first compared with the conventional AOR [
39] and IGS [
38]. We then extend our comparison to include another method, E-SOR [
34], in terms of SER for various system configurations, with MMSE used as the benchmark. The SER is an important performance evaluation metric in communication systems to evaluate the performance of signal detection [
53,
54]. The simulations are performed under 16, 32, and 64 quadrature amplitude modulation (QAM) schemes, assuming perfect Channel State Information (CSI) at the receiver. The results are derived by averaging the SER across a suitable number of Monte Carlo trials, i.e., 
 to 
.
To evaluate the performance of the proposed method, the simulation results of SER as a function of iteration number are presented in 
Figure 5. An antenna configuration of 
 = 384 × 64 is considered to compare the results across different SNR levels in 64-QAM. The results demonstrate that the performance of our proposed method becomes stable right after a few iterations, and it shows that our method achieves faster convergence. It is also evident that at lower SNR values, fewer iterations are required. For example, for the SNR levels of 8 and 10, only 2 iterations are required for stability. It also gives the insight that our proposed method has robust performance under varying conditions and low-SNR situations, which is more suitable for practical applicability.
Now we evaluate the performance comparison of the proposed method with other detectors in different antenna configurations. First, we take a less dense antenna scenario with 
 BS antennas and 
 users in 16- and 64-QAM modulation schemes. 
Figure 6 presents the comparison in 16-QAM, while 
Figure 7 presents that with 64-QAM modulation. In both figures, the performance of all methods improves with the increase in iteration count, but the performance of the proposed method is better than all other methods at iteration 
 in 16-QAM and 
 in 64-QAM, closely achieving the near-optimal performance of MMSE. It can be observed that the AOR converges very slowly for such a small ratio between the users and BS antennas, resulting in a large estimation error. This validates the effectiveness of the OAOR method in achieving near-MMSE performance with high detection accuracy.
Figure 8 illustrates the SER performance versus SNR for different signal detectors in 16-QAM modulation for the antenna configuration of 
. Here, the proposed method demonstrates superior performance compared to the AOR and IGS detectors and achieves near-optimal performance of the MMSE detector with only four iterations. It highlights its ability to provide excellent detection accuracy of the received signal. If we employ a higher modulation of 64-QAM in 
Figure 9 for the same antenna configuration, we can observe that the AOR error rate is higher as compared to our proposed detector. IGS has slightly better performance than AOR, but the detection accuracy of our method is better than IGS at iteration 
. In these results, although AOR and IGS have better performance at the initial 2 iterations, at the final iteration, our proposed method surpasses the other detectors by achieving near-MMSE performance with the lowest error rate.
 Now we evaluate the performance of various detectors in challenging conditions by increasing the number of BS antennas 
N and users 
K. First, we test our proposed method for 
 and 
 employing 16-QAM in 
Figure 10 and compare its performance with other methods. It shows that the proposed method achieves optimal performance while other methods struggle, and their error rates are higher than OAOR. In the higher modulation of 64-QAM, 
Figure 11 shows that with one higher iteration count, the proposed model still performs optimally. The total error rate in 16-QAM is lower than that in 64-QAM because of the amplified Euclidean distance among constellation points. However, the proposed method still maintains a substantial performance gap over AOR and IGS, especially at high SNR levels. Overall, in both modulation schemes, the proposed method exhibits faster decay in SER at higher SNR values, which indicates faster convergence at a small 
 antenna ratio.
Now, we simulate a more complex antenna scenario employing 600 BS antennas and 200 users. In 
Figure 12, we evaluate the performance of different detectors for 16-QAM modulation and 
Figure 13 utilizes 64-QAM. Both results show that the proposed method consistently gives good results under very challenging conditions and achieves near-MMSE performance earlier than the IGS and AOR. If we closely observe the performance of the final iteration in 
Figure 12, to achieve the SER level of 
, we can see that our proposed method achieves such a low error rate with the difference of 0.1 dB SNR with MMSE. Meanwhile, the error rate of the AOR detector at the same level of SNR is very high. A similar trend can be observed in the higher modulation scheme of 64-QAM as shown in 
Figure 13. In this case, the proposed method required one extra iteration to achieve near-MMSE results. In contrast, all other methods require a greater number of iterations to achieve the same performance at this higher order of modulation.
Figure 14 and 
Figure 15 show the performance comparison between the proposed algorithm and E-SOR in 32-QAM signaling. We can see that E-SOR has a trend similar to that of AOR and IGS, but it performs slightly better than IGS because it handles iterations efficiently while having a fixed relaxation parameter value. However, its performance is still lower compared to the proposed method because of its fixed parametric values. By increasing the number of user antennas from 18 to 21, we observe that E-SOR, at a higher iteration count (i.e., 
i = 4), surpasses IGS by 0.21 dB at an SER level of 
. Nonetheless, it shows a greater difference from MMSE compared to the proposed method, which clearly demonstrates that our proposed method has the advantage of being implemented in diverse signaling and antenna configurations. Overall, this shows that the proposed method is easily scalable for a large number of antennas for both transmitter and receiver antennas in very dense and loaded system configurations, which is more suitable for practical implementation.
 In summary, the OAOR method consistently delivers near-MMSE performance across various system configurations, outperforming other detectors. Specifically, it converges quickly and achieves the desired performance with fewer iterations. Moreover, the complexity analysis, as discussed in 
Section 4, already reveals the superior performance of the proposed algorithm. Thus, we can conclude that the proposed method’s ability to converge quickly and attain near-MMSE performance with fewer iterations, as well as its ability to efficiently handle large antenna systems makes it a promising candidate for signal detection in massive MIMO systems. Additionally, this method can also be applied to other signal processing problems, to achieve low complexity, such as channel estimation in massive MIMO systems.