Design and Performance Analysis of NadamSPGD Algorithm for Sensor-Less Adaptive Optics in Coherent FSOC Systems

: Sensor-less adaptive optics (SLAO) based on stochastic parallel gradient descent (SPGD) is effective for the compensation of atmospheric turbulence in coherent free-space optical communication (CFSOC) systems. However, SPGD converges slowly and easily falls into local extremes. Therefore, we propose a novel NadamSPGD algorithm for efﬁcient wavefront correction that combines Nesterov-accelerated adaptive moment estimation (Nadam) and SPGD. Speciﬁcally, Nesterov’s accelerated gradient momentum (NAG) and adaptive gain coefﬁcients are integrated to conventional SPGD to accelerate its convergence speed and avoid converging to extremum points. Theoretical analysis, numerical simulations and experimental results demonstrate that NadamSPGD can increase the convergence speed by ~50% and signiﬁcantly improve the robustness of parameters, and thus more efﬁciently suppress the negative effects of atmospheric turbulence on mixing efﬁciency (ME) and bit error rate (BER). Our algorithm also presents better dynamic performance under strong turbulence and high Greenwood frequency conditions, and it is more suitable for real-time SLAO systems. This study proves that the NadamSPGD algorithm is suitable for SLAO in the CFSOC system and is a viable substitute for SPGD to improve the quality of optical communications.


Introduction
The free space optical communication (FSOC) system has developed rapidly in modern communications due to the advantages of security, communication speed and licensefree operation [1,2]. Recently, the coherent free space optical communication (CFSOC) has attracted more attention for its longer relay distance, higher sensitivity and better receiver selectivity compared with conventional FSOC [3][4][5]. However, the application of CFSOC is seriously hindered by atmospheric turbulence. The mixing efficiency (ME) and the bit error rate (BER) of the CFSOC system are severely degraded owing to the wavefront distortions caused at the receiver [6][7][8]. Adaptive optics (AO) is considered as one of the effective methods to compensate wavefront aberrations induced by atmospheric turbulence. Many applications of AO in CFSOC have been presented and made significant achievements [9][10][11]. In conventional AO systems, the Shack-Hartmann wavefront sensor (SH-WFS) is widely used and directly determines the system performance. However, due to the inherent shortcomings of its working principle, it is challenging to obtain satisfactory accuracy under strong scintillation or low optical power [12,13], which directly degrades system performances. Therefore, a sensor-less adaptive optics (SLAO) system based on multi-dimensional optimization algorithms is proposed. The SLAO system optimizes performance indicators of CFSOC directly depending on received images, and no longer needs wavefront reconstruction [14][15][16][17][18]. The multi-dimensional optimization algorithm has a significant impact on the performance of the SLAO system. Although various algorithms have been proposed to perform wavefront correction, the stochastic parallel gradient descent (SPGD) algorithm is most widely used in SLAO due to its simple model, easy implementation and few parameters [19,20]. However, SPGD has the shortcomings of slow convergence speed and easily falling into local extremes that limit its practical applications, especially in complex realtime systems [21]. In order to address the above problems, several attempts have been conducted in SPGD to speed up the convergence (decrease the iteration numbers) and/or avoid falling into local extremes. The decoupled SPGD (DSPGD) algorithm was proposed by Lachinova et al. to improve the convergence efficiency for compensation of atmospheric phase aberrations in the tiled fiber array system [22]. However, the available applications of DSPGD are limited by the requirement of prior knowledge of performance metrics. The modified SPGD based on the use of updating rules with finite memory and the frozen hypothesis was proposed by Gao et al. to correct the rapidly changing aero-optical aberrations in the AO system [23]. However, the modification increases the complexity of the algorithm, and the improvement effect is greatly affected by the perturbation. The multi-perturbation SPGD with the fast-decent mode and the modal basis updating mode was developed by Wu et al. to enhance the effectiveness of the algorithm [24]. However, this method needs to split the incoming beam into N sub-beams and use N wavefront correctors that increase the complexity of the optical system. The adaptive SPGD (ASPGD) integrating the momentum and adaptive gain coefficient estimation was proposed by Hu et al. to control a fast steering mirror (FSM) to achieve efficient fiber coupling [25]. The ASPGD avoids converging to the local extremum points and accelerates the convergence speed of SPGD to some extent. Yang et al. added pattern recognition in SPGD to check and prevent the algorithm from trapping into a local extreme in the incoherent beam combining system [26]. Song et al. modified SPGD with a momentum term derived from the Newtonian equation to improve the convergence speed and disturbance immunity of the coherent beam combining [27]. Ma et al. proposed an improved algorithm called adaptive gradient estimation SPGD for beam clean-up of a solid-state laser to obtain high output beam quality and increase the convergence speed and algorithm stability [28]. Summarizing, although the above studies obtained promising results, most of them were aimed at specific optical problems, lack applications in the CFSOC system, and cannot be applied to achieve efficient ME and decrease the BER directly. The feasibility of these algorithms in the CFSOC system requires further verification and implementation.
In order to solve the above problems, this study first analyzes the ME and BER of the CFSOC system according to coherent communication theory, and establishes the relationship between system performance indicators and the fitness of the optimization algorithm in SLAO. Inspired by Nesterov-accelerated adaptive moment estimation (Nadam) in deep learning [29], a novel algorithm named NadamSPGD is proposed. The proposed NadamSPGD combines Nesterov's accelerated gradient (NAG) momentum and the adaptive gain coefficients with SPGD to improve the correction speed and robustness of SLAO without noticeably increasing the complexity of the algorithm. The wavefront correction effect of the proposed algorithm is analyzed through numerical simulations and laboratory experiments. The results demonstrate that NadamSPGD can significantly increase the convergence speed and robustness in SLAO, and the negative influence of atmospheric turbulence on ME and BER can be efficiently suppressed, which is of great significance to the CFSOC system.
The structure of this paper is as follows. In Section 2, system models and working principles of CFSOC and SLAO are described and analyzed. The basic principles of the SPGD algorithm and the Nadam optimizer are introduced, and a novel hybrid algorithm NadamSPGD making full use of their characteristics is proposed in Section 3. In Section 4, the phase aberration model based on the Zernike polynomial is established, and related simulations and comparisons are presented to demonstrate the improved performance of NadamSPGD in the CFSOC system. We also performed a 97-element closed-loop SLAO Photonics 2022, 9, 77 3 of 15 experiment to verify the feasibility of the proposed algorithm. Finally, Section 5 draws the conclusions.

CFSOC System Model with SLAO
The architecture of the typical CFSOC system with SLAO is illustrated in Figure 1. At the transmitting terminal, the laser source is modulated to generate a carrier signal. At the receiving terminal, the received optical signal is mixed with local oscillation to generate an intermediate frequency signal. Then, the proper demodulator and digital signal processer are used to further process and complete the subsequent processing. During the transmission of the laser through the atmospheric link, the wavefront is distorted by atmospheric turbulence. The SLAO system is introduced in the receiving terminal to compensate the influence of atmospheric turbulence and improve the quality of the optical signal.
simulations and comparisons are presented to demonstrate the improved performance of NadamSPGD in the CFSOC system. We also performed a 97-element closed-loop SLAO experiment to verify the feasibility of the proposed algorithm. Finally, Section 5 draws the conclusions.

CFSOC System Model with SLAO
The architecture of the typical CFSOC system with SLAO is illustrated in Figure 1. At the transmitting terminal, the laser source is modulated to generate a carrier signal. At the receiving terminal, the received optical signal is mixed with local oscillation to generate an intermediate frequency signal. Then, the proper demodulator and digital signal processer are used to further process and complete the subsequent processing. During the transmission of the laser through the atmospheric link, the wavefront is distorted by atmospheric turbulence. The SLAO system is introduced in the receiving terminal to compensate the influence of atmospheric turbulence and improve the quality of the optical signal. The schematic of the SLAO system composed of a beam steering unit (BSU) and a high-order aberration correction unit (HCU) is shown in Figure 2. In BSU, by quickly capturing and tracking the beam, large skew of the laser beam, such as tilts and jitters, is corrected through an FSM. In HCU, the laser carrier signal falls on the deformable mirror (DM) and then is divided into two beams by a beam splitter (BS). A high-speed camera (HSC) is used to capture speckle images, and the current performance index of the system is obtained by calculating the energy concentration rate of the images. Then, the selected optimization algorithm is executed in the high-order aberration correction computer (HCC) and generates a voltage control signal of the DM according to the performance index. Finally, the high-voltage amplifiers are used to amplify the signal to a suitable voltage range and control the DM to correct the distorted wavefront. After SLAO correction, the performance of the CFSOC system can be improved with higher ME and lower BER. In this study, we only consider the high-order aberration correction [9,16].  The schematic of the SLAO system composed of a beam steering unit (BSU) and a high-order aberration correction unit (HCU) is shown in Figure 2. In BSU, by quickly capturing and tracking the beam, large skew of the laser beam, such as tilts and jitters, is corrected through an FSM. In HCU, the laser carrier signal falls on the deformable mirror (DM) and then is divided into two beams by a beam splitter (BS). A high-speed camera (HSC) is used to capture speckle images, and the current performance index of the system is obtained by calculating the energy concentration rate of the images. Then, the selected optimization algorithm is executed in the high-order aberration correction computer (HCC) and generates a voltage control signal of the DM according to the performance index. Finally, the high-voltage amplifiers are used to amplify the signal to a suitable voltage range and control the DM to correct the distorted wavefront. After SLAO correction, the performance of the CFSOC system can be improved with higher ME and lower BER. In this study, we only consider the high-order aberration correction [9,16].

DM Model in SLAO
We have designed and manufactured a 97-element continuous surface DM as the wavefront corrector in this study. By changing the surface shape of the m real time according to the control voltage, the CSDM can efficiently correct wavef errations. Generally, the influence of the CSDM is estimated with a Gaussian f [30,31]: where ω is the coupling coefficient determined by the size of the electrode driver CSDM, (x j , y j ) is the center coordinate of the jth actuator, α is the Gaussian inde is the normalized interval between adjacent actuators. The phase compensation produced by the CSDM with 97 actuators is expressed as: where j is the voltage of the jth actuator in the range of the maximum possible It can be seen that φ x, y is linear with the voltages applied to the actuators.

Theoretical Basis of the CFSOC
In a CFSOC system, the ME and BER are effective indicators to evaluate the mance. Assuming that the local oscillator (LO) is a plane wave and the intensit received optical signal (OS) is uniform, based on the theory of coherent detection, optical power of the combined beam in CFSOC is given by [32]: where ∆φ = φ S − φ O is the phase difference between OS and LO, fS and fO are t quencies, and AS and AO represent the optical amplitudes of OS and LO, resp Generally, a signal symbol transmission time is less than 1 ns in the CFSOC system the Greenwood frequency (GF) is on the millisecond scale. According to Taylor' lence hypothesis, the phase aberrations ∆φ caused by atmospheric turbulence can sidered frozen during the detection time, and given by the expression:

DM Model in SLAO
We have designed and manufactured a 97-element continuous surface DM (CSDM) as the wavefront corrector in this study. By changing the surface shape of the mirror in real time according to the control voltage, the CSDM can efficiently correct wavefront aberrations. Generally, the influence of the CSDM is estimated with a Gaussian function [30,31]: where ω is the coupling coefficient determined by the size of the electrode driver and the CSDM, (x j , y j ) is the center coordinate of the jth actuator, α is the Gaussian index and d is the normalized interval between adjacent actuators. The phase compensation ϕ(x, y) produced by the CSDM with 97 actuators is expressed as: where u j is the voltage of the jth actuator in the range of the maximum possible voltage. It can be seen that ϕ(x, y) is linear with the voltages applied to the actuators.

Theoretical Basis of the CFSOC
In a CFSOC system, the ME and BER are effective indicators to evaluate the performance. Assuming that the local oscillator (LO) is a plane wave and the intensity of the received optical signal (OS) is uniform, based on the theory of coherent detection, the total optical power of the combined beam in CFSOC is given by [32]: where ∆ϕ = ϕ S − ϕ O is the phase difference between OS and LO, f S and f O are their frequencies, and A S and A O represent the optical amplitudes of OS and LO, respectively. Generally, a signal symbol transmission time is less than 1 ns in the CFSOC system, while the Greenwood frequency (GF) is on the millisecond scale. According to Taylor's turbulence hypothesis, the phase aberrations ∆ϕ caused by atmospheric turbulence can be considered frozen during the detection time, and given by the expression: where ϕ(r) represents the spatial part of phase aberrations caused by atmospheric turbulence, which is time-independent, and ϕ(t) denotes the temporal part of phase in the optical signal, which is space coordinate-independent. When in homodyne detection, f S = f O , then ME can be defined by: According to Equation (5), ME is approximate to the Strehl ratio (SR) of the far-field images, defined by the ratio of far-field encircled energy to the diffraction limited encircled energy [33]. We only consider the spatial and temporal errors, then after A O compensation, the mathematical expectation of the residual wavefront phase aberrations based on a CSDM can be expressed as [34][35][36]: where σ 2 fit is the spatial characteristic that represents the wavefront fitting error caused by the limited number of CSDM actuators, σ 2 time is the temporal error due to the contradiction between fast-changing atmospheric turbulence and the finite closed-loop control bandwidth (CLCB) of the SLAO system, α F is the fitting error coefficient, r 0 denotes the atmospheric coherent length, d is the equivalent interval of the actuators interval projected on the entrance pupil of the receiving antenna, κ is a constant which equals 1 for the plane wave, f 3dB denotes the CLCB and f G represents the GF.
Thus, the relationship between ME and the average residual wavefront variance can be expressed as: The BER of the synchronous binary phase shift keying (BPSK) coherent detection can be expressed as: where the function erfc is the complementary error function, δ is the quantum efficiency of the receiver detector and N p respects the number of photons received within a signal bit.

Fitness in SLAO
It is necessary to establish a connection between the fitness of optimization algorithms in SLAO and system evaluation indicators of the CFSOC system. First, we assume that the initial wavefront aberration of the laser carrier signal through the atmospheric channel is ϕ 0 (r, θ), and solutions of the algorithm are 97-dimension vectors u = {u 1 , u 2 , . . . , u 97 }; every component in the vectors respects the control voltage of each actuator in CSDM. In SLAO, the solutions are continuously updated and the compensation phase ϕ(r, θ) is generated according to Equation (2). Therefore, the residual phase aberration can be given by OE(r, θ) = ϕ 0 (r, θ) − ϕ(r, θ). On the basis of the analysis in Section 2.3, this study takes the SR of OE(r, θ) as the fitness J in our algorithm to simplify the calculation [33,36,37]. In the following sections, the aim of the SLAO system is to optimize J to its ideal value. When the maximum J is obtained through the algorithm, we obtain the optimum voltage signals u and the best ME of the CFSOC system.

Conventional SPGD in SLAO
When using the conventional SPGD in SLAO, the compensation of the wavefront aberration can be described in the following steps. First, random tiny perturbation voltages ∆u (k) = {∆u 1 , ∆u 2 , · · · , ∆u 97 } that satisfy the Bernoulli distribution are applied to CSDM simultaneously to obtain the gradient estimation. The disturbances ∆u (k) have fixed amplitude, i.e., ∆u (k) = ∆u. Then, in brief, by using perturbed indicators values J (k) ± = J u (k−1) ± ∆u (k) , the variation of the performance metric can be calculated by − . Thereafter, the iterative formula for updating the CSDM control voltages can be expressed as: where the superscript k denotes the number of iterations and γ is the positive gain coefficient. From Equation (9), as u (k) updates along the direction of the gradient descent, the performance metric J reaches an extremum after multiple iterations. The conventional SPGD only considers the current update vector, making it easy to form a local extremum and converge slowly if the gradient becomes flat or the curvature is large. Another limitation of SPGD is that all the optimization parameters use a single gain rate, and it is difficult to find a suitable gain rate value in real-world wavefront correction systems. For the above two problems in SPGD, a feasible solution is to add a momentum term to accumulate the past gradient encountered in the previous updates and set variable gain coefficients.

NadamSPGD
The Nesterov-accelerated adaptive moment estimation (Nadam) proposed by Dozat at the International Conference on Learning Representations (ICLR) in 2016 is an extension to adaptive moment estimation (Adam) that uses NAG momentum. Adam is an extension of gradient descent in neural networks that adds a first and second moment of the gradient, incorporates some inertia to updates and automatically adapts a learning rate for each parameter that is being optimized [38]. NAG is an extension to momentum terms; the update in NAG is performed using the gradient of the projected update to the parameter rather than the actual current variable value [39]. This has the effect of slowing down the search when the optimal value is located, rather than overshooting the minima as in traditional momentum. Nadam takes advantages of Adam and NAG, and can result in better performance of gradient-based optimization algorithms. Many studies have shown that in most cases, Nadam can improve the speed of convergence and the quality of learned models compared to a slew of related algorithms such as RMSProp, Momentum and Adam.
Inspired by Nadam, in view of the above two deficiencies in SPGD, we incorporate the Nesterov momentum and the adaptive gain coefficient estimation into the conventional SPGD, and propose a novel NadamSPGD algorithm to accelerate the optimization speed in SLAO.
Firstly, the gradient in NadamSPGD for the current step is approximated as [25]: Next, the first momentum term is introduced into SPGD to accelerate its convergence. The exponentially decaying moving average over the parameters is used to give higher weight to the more recent value. The momentum is updated using the hyper-parameter µ as: Then, the initialization bias correction strategy is used to offset the instability that initializing m (k) to zero may create. As momentum is most effective with a warming schedule, we parameterize µ by k, as well, just like in the NAG algorithm for completeness. Mathematically, the relevant formula is as follows: Them (k) is the final form of the Nesterov momentum term in NadamSPGD. It uses the change from the last iteration to calculate the projected position of the variable, and then uses the derivative of the projected position in the calculation of the new position for the variable. By calculating the gradient of the projected position, it is equivalent to adding a correction factor to the acceleration that has been accumulated. Logically speaking, this will produce a superior gradient update. Nesterov momentum is known to reduce the number of iterations required and improve the rate of convergence of the optimization algorithm.
Furthermore, in order to solve the single gain rate in SPGD, we adjust the adaptive gain rate for different parameters by involving the second momentum term using the hyper-parameter v: where hyper-parameter v controls the exponential decay rate of the moving average, and the biased second momentum term n (k) is then bias-corrected by Equation (14) to avoid being initialized to zero at the start of the search, resulting in the bias-corrected estimatê n (k) . Then (k) sums up the weighted square results of the past gradients that indicate the uncentered variance of the gradients. During the updating process, we dividen (k) to search the suitable gain rate adaptively.
As discussed above, we update CSDM control voltage vectors using NadamSPGD as follows: where η is the learning rate and ε is a parameter to avoid division by zero error, usually set to 10 −8 . The implementation of the NadamSPGD algorithm for SLAO is comprehensively described in Algorithm 1.

Algorithm 1 Pseudo code of the NadamSPGD algorithm.
Pseudo Code of the NadamSPGD Algorithm Input: The learning rate α, the hyper-parameters µ and v, the constant ε, the amplitude of random perturbation voltages ∆u, and the maximal number of iterations N. Output : Calculated the control voltage vectors of CSDM u = {u 1 , u 2 , . . . , u 97 }. 1 : Initialize control voltage vectors u (0) , the 1st momentum term m (0) , the 2nd momentum term n (0) 2 : for k = 1, . . . , N do 3 : Randomly generate the perturbed voltages obeying the Bernoulli distribution ∆u (k)

:
Obtain the evaluation functions under perturbation voltage J (k) Obtain the change of the evaluation function ∆J (k) = J (k) Calculate the gradient g (k) (see Equation (10)) 7 : Calculate the bias − corrected Nesterov momentumm (k) (see Equations (11) and (12)) 8 : Calculate the bias − corrected 2nd momentum termn (k) (see Equations (13) and (14)) 9 : Update the control voltage u (k) (see Equation (15)) 10 : end for Theoretically, the NadamSPGD algorithm is better than SPGD in terms of the gradient estimation and adaptive gain coefficient. The Nesterov momentum factor can improve convergence and suppress oscillations. The adaptive gain coefficient is adjusted in real time during the iterations, which can improve the convergence speed of corrections. NadamSPGD can be considered as an extension of SPGD that takes full advantages of gradients without noticeably increasing implementation complexity. The following simulations focus on the correction speed and robustness of the two algorithms.

Simulation Analysis
Zernike polynomials are widely used to describe the distorted wavefront caused by atmospheric turbulence. The wavefront ϕ 0 (r, θ) can be considered as the two-dimensional functions decomposed by Zernike polynomials in polar coordinates [40]: where a i is the coefficient of the ith Zernike polynomial. The 0th term and Z 1 (r, θ), Z 2 (r, θ) represent the piston and tilt aberrations along the X and Y directions, respectively, and can be corrected by BSU directly. In our simulations, the wavelength λ is set to 635 nm and D/r 0 = 10, then the 3rd to 35th terms in Zernike polynomials are added as the distorted wavefront to imitate atmosphere turbulence. The randomly generated initial Zernike coefficients from a 3 − a 35 are given in Figure 3a. The corresponding distorted phase of the original wavefront and the original point spread function (PSF) are shown in Figure 3b,c, respectively. The original wavefront is seriously distorted, and the initial ME is 0.2277. In the next simulations, the voltages of the 97-element CSDM are algorithmically calculated to compensate this wavefront aberration to compare the performance of algorithms. To facilitate our observation, the simulation results treat the optimization objective as ME. Since the calculation time is related to the performance of hardware systems, the performance improvement of algorithms is generally evaluated by comparing the number of iterations.

Simulation Analysis
Zernike polynomials are widely used to describe the distorted wavefront caused by atmospheric turbulence. The wavefront φ 0 (r,θ) can be considered as the two-dimensional functions decomposed by Zernike polynomials in polar coordinates [40]: where a i is the coefficient of the ith Zernike polynomial. The 0th term and Z 1 r,θ , Z 2 r,θ represent the piston and tilt aberrations along the X and Y directions, respectively, and can be corrected by BSU directly. In our simulations, the wavelength λ is set to 635 nm and D/r0 = 10, then the 3rd to 35th terms in Zernike polynomials are added as the distorted wavefront to imitate atmosphere turbulence. The randomly generated initial Zernike coefficients from a 3 − a 35 are given in Figure 3a. The corresponding distorted phase of the original wavefront and the original point spread function (PSF) are shown in Figure 3b,c, respectively. The original wavefront is seriously distorted, and the initial ME is 0.2277. In the next simulations, the voltages of the 97-element CSDM are algorithmically calculated to compensate this wavefront aberration to compare the performance of algorithms. To facilitate our observation, the simulation results treat the optimization objective as ME. Since the calculation time is related to the performance of hardware systems, the performance improvement of algorithms is generally evaluated by comparing the number of iterations. In our study, we follow this common practice [16][17][18][19][20][21]. In Figure 4, the corresponding residual wavefront aberration and PSFs correction results under different iterations with NadamSPGD are presented. The ME increases from 0.2277 to 0.9099 after 400 iterations, that is, four times before compensation. Clearly, most of the distortions have been well compensated by NadamSPGD. In Figure 4, the corresponding residual wavefront aberration and PSFs correction results under different iterations with NadamSPGD are presented. The ME increases from 0.2277 to 0.9099 after 400 iterations, that is, four times before compensation. Clearly, most of the distortions have been well compensated by NadamSPGD. Photonics 2022, 9, x FOR PEER REVIEW 9 of 15 Next, we further verify the performance improvement of the proposed algorithm by simulations. Considering the randomness of algorithms, we execute each simulation 100 times. The optimization curves of ME and system BER based on SPGD and NadamSPGD with the number of iterations under their optimal parameters are illustrated in Figure 5. The learning rate α = 0.1, the hyper-parameters μ = 0.999 and v = 0.99, μ k = μ 1 − 0.5 × 0.96 k 250 ⁄ as suggested in Reference [29], the constant ε = 10 8 , the amplitude of random perturbation voltages Δu = 0.5, the quantum efficiency δ = 1, N p = 12 and the BER is calculated according to Equation (8). From Figure 5, both SPGD and our algorithm can effectively compensate the aberration, improve the ME, and decrease the system BER. We treated 0.8 as the index of ME to observe and compare the feature of the two algorithms intuitively [41]. The SPGD reaches Next, we further verify the performance improvement of the proposed algorithm by simulations. Considering the randomness of algorithms, we execute each simulation 100 times. The optimization curves of ME and system BER based on SPGD and NadamSPGD with the number of iterations under their optimal parameters are illustrated in Figure 5. The learning rate α = 0.1, the hyper-parameters µ = 0.999 and v = 0.99, µ (k) = µ 1 − 0.5 × 0.96 k/250 as suggested in Reference [29], the constant ε = 10 −8 , the amplitude of random perturbation voltages ∆u = 0.5, the quantum efficiency δ = 1, N p = 12 and the BER is calculated according to Equation (8).  Next, we further verify the performance improvement of the proposed algorithm by simulations. Considering the randomness of algorithms, we execute each simulation 100 times. The optimization curves of ME and system BER based on SPGD and NadamSPGD with the number of iterations under their optimal parameters are illustrated in Figure 5. The learning rate α = 0.1, the hyper-parameters μ = 0.999 and v = 0.99, μ k = μ 1 − 0.5 × 0.96 k 250 ⁄ as suggested in Reference [29], the constant ε = 10 8 , the amplitude of random perturbation voltages Δu = 0.5, the quantum efficiency δ = 1, N p = 12 and the BER is calculated according to Equation (8). From Figure 5, both SPGD and our algorithm can effectively compensate the aberration, improve the ME, and decrease the system BER. We treated 0.8 as the index of ME to observe and compare the feature of the two algorithms intuitively [41]. The SPGD reaches From Figure 5, both SPGD and our algorithm can effectively compensate the aberration, improve the ME, and decrease the system BER. We treated 0.8 as the index of ME to observe and compare the feature of the two algorithms intuitively [41]. The SPGD reaches the index after at least 166 iterations, and at most 242 iterations in the worst case, averaging at 193 iterations. NadamSPGD converges after at least 81 iterations, and at most 163 iterations, with an average of 112 iterations, which is 58.03% of SPGD. In addition, for the standard deviation, SPGD fluctuates greatly, as they merely depend on the random disturbance at each iteration and the current gradient. The NadamSPGD considers both the current gradient and historical gradients during the iteration process; thus, it reduces the impact of randomness. In summary, NadamSPGD not only converges faster than SPGD, but also has better robustness to the randomness of disturbances. Figure 5b shows that the system BER is substantially improved after the wavefront aberration is compensated with NadamSPGD. The BER dropped from approximately 10 −3 to 10 −10 after 149 iterations. Due to the inherent limitation of SPGD, the value of BER cannot be suppressed below 10 −10 until 262 iterations.
The amplitude of random perturbation voltages has a great influence on the performance of gradient-based algorithms. To verify the robustness of the two algorithms to perturbations, we evaluate SPGD and NadamSPGD under the same settings as previous simulations, except changing ∆u from 0.01 to 1. The results are shown in Figure 6, from which we can see that SPGD is extremely sensitive to ∆u. When ∆u is 0.01, the correction speed of SPGD is extremely slow (Figure 6a), and when ∆u increases to 0.1, the ME curve based on SPGD only reaches 0.3308 after 400 iterations (Figure 6b). When ∆u increases to 1, SPGD prematurely converges to the local optimum ( Figure 6c). However, the change in ∆u has almost no effect on the correction performance of NadamSPGD, and it still works well with ∆u ∈ [0.01 , 1] (shown in Figure 6d-f). The results demonstrate that NadamSPGD has robustness in a large range of ∆u values, which makes it easy to apply to practical applications of SLAO.
Photonics 2022, 9, x FOR PEER REVIEW 10 of 15 the index after at least 166 iterations, and at most 242 iterations in the worst case, averaging at 193 iterations. NadamSPGD converges after at least 81 iterations, and at most 163 iterations, with an average of 112 iterations, which is 58.03% of SPGD. In addition, for the standard deviation, SPGD fluctuates greatly, as they merely depend on the random disturbance at each iteration and the current gradient. The NadamSPGD considers both the current gradient and historical gradients during the iteration process; thus, it reduces the impact of randomness. In summary, NadamSPGD not only converges faster than SPGD, but also has better robustness to the randomness of disturbances. Figure 5b shows that the system BER is substantially improved after the wavefront aberration is compensated with NadamSPGD. The BER dropped from approximately 10 −3 to 10 −10 after 149 iterations. Due to the inherent limitation of SPGD, the value of BER cannot be suppressed below 10 −10 until 262 iterations. The amplitude of random perturbation voltages has a great influence on the performance of gradient-based algorithms. To verify the robustness of the two algorithms to perturbations, we evaluate SPGD and NadamSPGD under the same settings as previous simulations, except changing Δu from 0.01 to 1. The results are shown in Figure 6, from which we can see that SPGD is extremely sensitive to Δu. When Δu is 0.01, the correction speed of SPGD is extremely slow (Figure 6a), and when Δu increases to 0.1, the ME curve based on SPGD only reaches 0.3308 after 400 iterations (Figure 6b). When Δu increases to 1, SPGD prematurely converges to the local optimum ( Figure 6c). However, the change in Δu has almost no effect on the correction performance of NadamSPGD, and it still works well with Δu∈[0.01, 1] (shown in Figure 6d-f). The results demonstrate that NadamSPGD has robustness in a large range of Δu values, which makes it easy to apply to practical applications of SLAO. According to the Kolmogorov turbulence model, the ratio of the receiving antenna aperture D to the atmospheric coherence length r 0 (D/r 0 ) can characterize the intensity of the atmospheric turbulence. In order to further explore the correction performance of the two algorithms under different turbulence intensities, we analyze the relationship between ME and the iteration number under different D/r 0 in Figure 7.
Photonics 2022, 9, x FOR PEER REVIEW 11 of 15 According to the Kolmogorov turbulence model, the ratio of the receiving antenna aperture D to the atmospheric coherence length r0 (D/r0) can characterize the intensity of the atmospheric turbulence. In order to further explore the correction performance of the two algorithms under different turbulence intensities, we analyze the relationship between ME and the iteration number under different D/r0 in Figure 7.  Figure 7, it is obvious that NadamSPGD can effectively correct turbulence with different intensities. As the intensity of turbulence increases, the correction speed gradually slows down. When D/r0 = 5, NadamSPGD achieves an ME of 0.8 after 38 iterations, while SPGD needs 55 iterations to achieve the equal correction effect. When D/r0 increases to 15, NadamSPGD converges after 291 iterations, while SPGD requires 594 iterations. However, the gap between the number of iterations required for two algorithms becomes larger with the increasing turbulence intensity. The results show that the advantages of NadamSPGD become more significant with the increase in turbulence intensity.
Based on the theoretical analysis in Section 2.3, we know that the GF of the atmosphere and the CLCB of SLAO severely affect the performance of the CFSOC system. When d = 0.06, α F = 0.28, r 0 = 0.15 and κ = 1, the relationships between the ME, BER and CLCB under different GFs were obtained and are shown in Figure 8. As Figure 8 illustrates, under different GFs, the higher the CLCB of the SLAO is, the higher the ME and lower the BER that will be achieved. Thus, a higher CLCB is necessary From Figure 7, it is obvious that NadamSPGD can effectively correct turbulence with different intensities. As the intensity of turbulence increases, the correction speed gradually slows down. When D/r 0 = 5, NadamSPGD achieves an ME of 0.8 after 38 iterations, while SPGD needs 55 iterations to achieve the equal correction effect. When D/r 0 increases to 15, NadamSPGD converges after 291 iterations, while SPGD requires 594 iterations. However, the gap between the number of iterations required for two algorithms becomes larger with the increasing turbulence intensity. The results show that the advantages of NadamSPGD become more significant with the increase in turbulence intensity.
Based on the theoretical analysis in Section 2.3, we know that the GF of the atmosphere and the CLCB of SLAO severely affect the performance of the CFSOC system. When d = 0.06, α F = 0.28, r 0 = 0.15 and κ = 1, the relationships between the ME, BER and CLCB under different GFs were obtained and are shown in Figure 8.
According to the Kolmogorov turbulence model, the ratio of the receiving antenna aperture D to the atmospheric coherence length r0 (D/r0) can characterize the intensity of the atmospheric turbulence. In order to further explore the correction performance of the two algorithms under different turbulence intensities, we analyze the relationship between ME and the iteration number under different D/r0 in Figure 7.  Figure 7, it is obvious that NadamSPGD can effectively correct turbulence with different intensities. As the intensity of turbulence increases, the correction speed gradually slows down. When D/r0 = 5, NadamSPGD achieves an ME of 0.8 after 38 iterations while SPGD needs 55 iterations to achieve the equal correction effect. When D/r0 increases to 15, NadamSPGD converges after 291 iterations, while SPGD requires 594 iterations However, the gap between the number of iterations required for two algorithms becomes larger with the increasing turbulence intensity. The results show that the advantages of NadamSPGD become more significant with the increase in turbulence intensity.
Based on the theoretical analysis in Section 2.3, we know that the GF of the atmosphere and the CLCB of SLAO severely affect the performance of the CFSOC system. When d = 0.06, α F = 0.28, r 0 = 0.15 and κ = 1, the relationships between the ME, BER and CLCB under different GFs were obtained and are shown in Figure 8. As Figure 8 illustrates, under different GFs, the higher the CLCB of the SLAO is, the higher the ME and lower the BER that will be achieved. Thus, a higher CLCB is necessary As Figure 8 illustrates, under different GFs, the higher the CLCB of the SLAO is, the higher the ME and lower the BER that will be achieved. Thus, a higher CLCB is necessary to guarantee the communication quality with the increase in GF. Since the resonant frequency of CSDM is very high (>5 kHz), its response time is extremely short, and its impact on CLCB is negligible compared to the computational delay introduced by algorithms [42]. As a result, the computational delay of the algorithm used for correction is the dominant factor in determining the CLCB. Generally, a more efficient algorithm means fewer iterations and higher CLCB. To further show the ability of NadamSPGD in the time domain, we assume that the number of iterations is inversely proportional to CLCB and the previous simulation results of NadamSPGD have achieved 100 Hz CLCB, considering the processing capacity of the FPGA and GPU-based high-performance processing platform. Thereafter, the ME and BER at different GFs can be calculated according to Equations (7) and (8), and the results are shown in Figure 9.
to guarantee the communication quality with the increase in GF. Since the resonant frequency of CSDM is very high (>5 kHz), its response time is extremely short, and its impact on CLCB is negligible compared to the computational delay introduced by algorithms [42]. As a result, the computational delay of the algorithm used for correction is the dominant factor in determining the CLCB. Generally, a more efficient algorithm means fewer iterations and higher CLCB. To further show the ability of NadamSPGD in the time domain, we assume that the number of iterations is inversely proportional to CLCB and the previous simulation results of NadamSPGD have achieved 100 Hz CLCB, considering the processing capacity of the FPGA and GPU-based high-performance processing platform. Thereafter, the ME and BER at different GFs can be calculated according to Equations (7) and (8), and the results are shown in Figure 9.  Figure 9 illustrates that as the GF increases, the dynamic ability of SLAO based on the two algorithms degrades. However, the attenuation of SPGD is more serious than NadamSPGD, meaning that the proposed algorithm has better dynamic performance and is more suitable for real-time SLAO systems.

Experiment
In this study, we also compare the two algorithms based on our SLAO experimental platform to analyze and demonstrate the performance improvement of NadamSPGD in the actual systems. Figure 10 presents the setup and photograph of the SLAO experimental platform.   Figure 9 illustrates that as the GF increases, the dynamic ability of SLAO based on the two algorithms degrades. However, the attenuation of SPGD is more serious than NadamSPGD, meaning that the proposed algorithm has better dynamic performance and is more suitable for real-time SLAO systems.

Experiment
In this study, we also compare the two algorithms based on our SLAO experimental platform to analyze and demonstrate the performance improvement of NadamSPGD in the actual systems. Figure 10 presents the setup and photograph of the SLAO experimental platform.
to guarantee the communication quality with the increase in GF. Since the resonant fre quency of CSDM is very high (>5 kHz), its response time is extremely short, and its impac on CLCB is negligible compared to the computational delay introduced by algorithms [42]. As a result, the computational delay of the algorithm used for correction is the dom inant factor in determining the CLCB. Generally, a more efficient algorithm means fewer iterations and higher CLCB. To further show the ability of NadamSPGD in the time do main, we assume that the number of iterations is inversely proportional to CLCB and the previous simulation results of NadamSPGD have achieved 100 Hz CLCB, considering the processing capacity of the FPGA and GPU-based high-performance processing platform Thereafter, the ME and BER at different GFs can be calculated according to Equations (7 and (8), and the results are shown in Figure 9.  Figure 9 illustrates that as the GF increases, the dynamic ability of SLAO based on the two algorithms degrades. However, the attenuation of SPGD is more serious than NadamSPGD, meaning that the proposed algorithm has better dynamic performance and is more suitable for real-time SLAO systems.

Experiment
In this study, we also compare the two algorithms based on our SLAO experimenta platform to analyze and demonstrate the performance improvement of NadamSPGD in the actual systems. Figure 10 presents the setup and photograph of the SLAO experi mental platform.  As shown in Figure 10, the SLAO experimental platform is constructed based on an auto-collimating optical system. An optical fiber-coupled laser source with a wavelength of 635 nm is used to emit the laser beam. The laser beam is collimated by lens L1, then reflected by the BS and further expanded by L3 and L4 to match the size of the 97-element CSDM. A field stop is added between lenses L3 and L4 to suppress the stray light. After correcting by the CSDM, the beam passes through the L4, L3 and BS again, and is narrowed and split. Finally, the beam reaches the lens L2, and the speckle images are captured by the HSC. The computer is used to process the captured images, execute the algorithms and control the CSDM through the driving circuits to compensate the aberrations.
In our experiments, the performances of SPGD and NadamSPGD are evaluated under the same initial conditions for a fair comparison. A set of CSDM initial control voltages are generated according to Zernike coefficients to generate the initial wavefront aberration, and then the control voltages are algorithmically calculated to gradually flatten the CSDM itself. The flattening process of CSDM can simulate the correction process of atmospheric turbulence. The energy concentration rate of speckle images is calculated to reflect the performance of SLAO. The optimal setting for SPGD is ∆u = 0.5 and γ = 2, and the optimal setting for NadamSPGD is ∆u = 0.5, α = 0.1, µ = 0.999, v = 0.99 and ε = 10 −8 . The experimental results are shown in Figure 11, from which we find that SPGD corrects slowly. After 300 iterations, the initial ME reaches 0.5331 from 0.1142, which significantly affects the system performance. However, NadamSPGD dynamically adjusts the gain factor according to the gradient estimation to achieve a rapid convergence. The ME using NadamSPGD reaches 0.8828 after 300 iterations, which is 1.66 times faster than SPGD.
Photonics 2022, 9, x FOR PEER REVIEW 13 of 15 As shown in Figure 10, the SLAO experimental platform is constructed based on an auto-collimating optical system. An optical fiber-coupled laser source with a wavelength of 635 nm is used to emit the laser beam. The laser beam is collimated by lens L1, then reflected by the BS and further expanded by L3 and L4 to match the size of the 97-element CSDM. A field stop is added between lenses L3 and L4 to suppress the stray light. After correcting by the CSDM, the beam passes through the L4, L3 and BS again, and is narrowed and split. Finally, the beam reaches the lens L2, and the speckle images are captured by the HSC. The computer is used to process the captured images, execute the algorithms and control the CSDM through the driving circuits to compensate the aberrations.
In our experiments, the performances of SPGD and NadamSPGD are evaluated under the same initial conditions for a fair comparison. A set of CSDM initial control voltages are generated according to Zernike coefficients to generate the initial wavefront aberration, and then the control voltages are algorithmically calculated to gradually flatten the CSDM itself. The flattening process of CSDM can simulate the correction process of atmospheric turbulence. The energy concentration rate of speckle images is calculated to reflect the performance of SLAO. The optimal setting for SPGD is Δu = 0.5 and γ = 2, and the optimal setting for NadamSPGD is Δu = 0.5, α = 0.1, μ = 0.999, v = 0.99 and ε = 10 8 . The experimental results are shown in Figure 11, from which we find that SPGD corrects slowly. After 300 iterations, the initial ME reaches 0.5331 from 0.1142, which significantly affects the system performance. However, NadamSPGD dynamically adjusts the gain factor according to the gradient estimation to achieve a rapid convergence. The ME using NadamSPGD reaches 0.8828 after 300 iterations, which is 1.66 times faster than SPGD. Figure 11. Comparison of correction results using SPGD and NadamSPGD on the SLAO experiment platform: (a) initial far-field image before correction; (b) SPGD correction result after 300 iterations; (c) SPGD residual wavefront ME as a function of iterations; (d) initial far-field image before correction; (e) NadamSPGD correction result after 300 iterations; (f) NadamSPGD residual wavefront ME as a function of iterations.

Conclusions
In this paper, a novel NadamSPGD algorithm combining Nadam and SPGD is proposed to compensate wavefront aberrations more effectively in CFSOC. The theoretical analysis and numerical simulations demonstrate that the negative influence of varying degrees of atmospheric turbulence on ME and BER of the CFSOC system can be suppressed by NadamSPGD. Specifically, by integrating the NAG momentum and adaptive Figure 11. Comparison of correction results using SPGD and NadamSPGD on the SLAO experiment platform: (a) initial far-field image before correction; (b) SPGD correction result after 300 iterations; (c) SPGD residual wavefront ME as a function of iterations; (d) initial far-field image before correction; (e) NadamSPGD correction result after 300 iterations; (f) NadamSPGD residual wavefront ME as a function of iterations.

Conclusions
In this paper, a novel NadamSPGD algorithm combining Nadam and SPGD is proposed to compensate wavefront aberrations more effectively in CFSOC. The theoretical analysis and numerical simulations demonstrate that the negative influence of varying degrees of atmospheric turbulence on ME and BER of the CFSOC system can be suppressed by NadamSPGD. Specifically, by integrating the NAG momentum and adaptive gain coefficients into the conventional SPGD, the proposed algorithm can not only accelerate the correction speed by approximately 50%, but also improve the robustness of parameters over a large range (∆u ∈ [0.01 , 1]) without noticeably increasing the complexity of the algorithm. Simultaneously, the stronger the turbulence intensity, the more obvious are the advantages of NadamSPGD. In addition, NadamSPGD exhibits improved dynamic capabilities as the Greenwood frequency increases, and therefore, it is more suitable for real-time SLAO systems. Finally, the effectiveness of the proposed algorithm is evaluated on our SLAO experiment platform, and the experimental results indicate that NadamSPGD converges much faster. In conclusion, NadamSPGD is more effective for SLAO to improve the communication quality of CFSOC systems, and is a good substitute for SPGD.
From the findings of this paper, researchers can design SLAO systems with excellent performance in CFSOC based on NadamSPGD. The proposed algorithm may shed light on the application of SLAO-based wavefront correction technology, such as astronomical observation, fiber laser coherent synthesis and biological microscopic imaging. In the future, we will build a high-performance processing platform based on FPGA and GPU, and apply the NadamSPGD to dynamic aberration correction experiments.