Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence

Seifi, Erfan; LoPresti, Peter

doi:10.3390/photonics12100979

Open AccessArticle

Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence

by

Erfan Seifi

and

Peter LoPresti

^*

Department of Electrical and Computer Engineering, University of Tulsa, Tulsa, OK 74104, USA

^*

Author to whom correspondence should be addressed.

Photonics 2025, 12(10), 979; https://doi.org/10.3390/photonics12100979

Submission received: 21 August 2025 / Revised: 15 September 2025 / Accepted: 30 September 2025 / Published: 1 October 2025

(This article belongs to the Special Issue Innovations in Optical Wireless Communications: Challenges and Opportunities)

Download

Browse Figures

Versions Notes

Abstract

Atmospheric turbulence severely limits the stability and reliability of free-space optical (FSO) uplinks by inducing wavefront distortions and random intensity fluctuations. This study investigates the use of reinforcement learning (RL) with beacon-based feedback for adaptive beam shaping in a spatial light modulator (SLM)-controlled FSO link. The RL agent dynamically adjusts phase patterns to maximize received signal strength, while the beacon channel provides turbulence estimates that guide the optimization process. Experiments under low, moderate, and high turbulence levels demonstrate that incorporating beacon feedback can enhance link stability in severe conditions, reducing signal variability and suppressing extreme fluctuations. In low-turbulence scenarios, the performance is comparable to non-feedback operation, whereas under high turbulence, beacon-assisted control consistently achieves lower coefficients of variation and improved bit error rate (BER) performance. Under high turbulence replay experiments—where the best-performing RL-learned phase patterns are reapplied without learning—further show that configurations trained with feedback retain robustness, even without real-time turbulence measurements under high turbulence. These results highlight the potential of integrating contextual feedback with RL to achieve turbulence-resilient and stable optical uplinks in dynamic atmospheric environments.

Keywords:

phase modulation; qubesats; reinforcement learning (RL); spatial light modulator (SLM)

Graphical Abstract

1. Introduction

Free-space optical (FSO) communication systems offer high-capacity, license-free wireless links but are highly susceptible to atmospheric disturbances. Turbulence leads to wavefront distortions and intensity fading, degrading link reliability—especially in long-range and satellite-to-ground applications. These effects have been modeled using distributions such as log-normal and Gamma–Gamma to assess BER and channel capacity under varying turbulence levels [1,2]. Environmental factors such as fog, beam misalignment, and dynamic traffic conditions further exacerbate degradation [3,4]. Hybrid RF/FSO systems, often combined with cognitive radio approaches, improve reliability by rerouting through RF when the optical path is disrupted [5]. Other mitigation strategies include spatial diversity, adaptive modulation, and system reconfiguration under severe conditions [6,7,8]. Recently, turbulence compensation has also been explored using machine learning methods such as lightweight neural networks and federated learning [9,10,11,12,13], as well as hardware-integrated photonic circuits and adaptive optics platforms [14,15]. However, these solutions often rely on complex calibration routines or require offline pretraining, limiting adaptability in dynamic scenarios.

These limitations are especially critical in the context of modern hybrid satellite communication infrastructure. The growing demands of global connectivity, disaster resilience, and high-throughput services have motivated the evolution of satellite–ground links into more agile, intelligent systems. Space–Air–Ground Integrated Networks (SAGINs) are a prominent example, leveraging hybrid FSO/RF architectures to combine the large capacity of optical links with the robustness of RF channels under adverse conditions [16]. To address turbulence and misalignment in such systems, several studies have proposed adaptive transmission strategies [17], hybrid modulation schemes [18], and strategic deployment of high-altitude platforms (HAPS) to relay and stabilize coverage [19]. These designs underscore the necessity of real-time optical link adaptation, especially in mobile and weather-impaired environments. Foundational surveys have identified beam tracking, turbulence modeling, and alignment tolerance as key technical bottlenecks [20], while advances in coherent FSO design have opened new pathways for dynamic beam control in satellite systems [21]. While such hybrid designs mitigate link outages, they still depend on robust optical links as the backbone. This highlights the need for real-time turbulence compensation techniques at the physical layer, which is the focus of this work.

Liquid crystal-on-silicon (LCOS) spatial light modulators (SLMs) offer a powerful toolset for high-resolution wavefront shaping in such adaptive platforms. Their reflective architecture and dense pixel arrays enable precise phase modulation in compact, programmable optical systems [22,23,24,25]. LCOS SLMs have been widely applied to generate structured beams like vortex, Bessel–Gauss, and Airy modes [26,27], as well as in applications spanning adaptive optics, turbulence simulation, and digital focusing [28]. Commercial devices now offer fine phase control with high optical efficiency [29], supporting real-time beam shaping. Meanwhile, emerging research has explored their integration into dynamic beam steering [30], nonlinear modulation platforms [31], and model-based optimization using neural surrogates [32] or iterative wavefront shaping through scattering media [33,34].

In this work, we build on these advances to develop an adaptive beam shaping system that uses reinforcement learning (RL) and beacon feedback to control an LCOS SLM. Unlike prior approaches that require channel modeling or calibration, our system learns directly from intensity measurements, enabling responsive adaptation to turbulence. We demonstrate that this method improves both the average received signal power and its temporal stability under dynamic free-space optical conditions. The advantage of incorporating an RL agent, rather than a traditional supervised deep neural network (DNN), is that RL optimizes the SLM phase directly from the receiver’s scalar reward; no labeled data is required. This makes RL naturally suitable for near-real-time adaptation to non-stationary turbulence. In contrast, a supervised DNN would require a large labeled dataset covering many turbulence states and could generalize poorly under distribution shifts. Updating such a model during deployment typically requires retraining.

2. Experimental Configuration

The experimental configuration and setup, shown in Figure 1 and Figure 2, replicates an FSO communication channel under controllable turbulence conditions. A pseudorandom bit sequence (PRBS) is first OOK-modulated at 850 nm using a Highland Technology J724 electro-optic converter operating at 10 kHz, emulating the uplink data signal. This beam is subsequently phase-shaped by a HOLOEYE ERIS-1.1 SLM, which is an LCOS device offering

1920 \times 1200

resolution, an

8.0 μ

m pixel pitch, full

2 π

phase modulation capability at 850 nm, and a display refresh rate of up to 60 Hz. Phase masks are updated via the HOLOEYE SDK, enabling RL control of Zernike-based phase profiles.

After modulation, the beam propagates through a custom-built turbulence chamber with a length of

77 cm

, equipped with two heating elements and two fans at its base. Fan speeds of 2400 RPM and 3100 RPM are referred to as medium and high turbulence levels, respectively. The received optical signal is collected by a Thorlabs PDA05CF2 amplified photodetector and digitized by a Measurement Computing USB-1608FS-Plus DAQ device at 100 kHz. A 0.3 s window is recorded for each trial, and the averaged received intensity serves as a scalar reward for the RL training loop.

In parallel, a 660 nm beacon beam is propagated in the reverse direction to provide feedback on turbulence intensity. A Thorlabs DET210 photodetector placed near the uplink transmitter captures the returned beacon signal. These measurements are used to estimate the refractive index structure constant (

C_{n}^{2}

), which is fed to the RL agent as part of the observation vector, offering contextual information about prevailing turbulence conditions.

2.1. Latency and Feasibility Analysis

To quantify the responsiveness of our control loop, we instrumented each stage of the experiment to measure its contribution to per-trial latency. The results for the beacon-incorporated case are summarized in Table 1. Phase generation on the host PC required on average of 363 ms, while transfer of the computed phase map to the SLM hardware driver added only 48 ms. RL inference using the TD3 MLP policy was negligible, at ∼1.2 ms. The dominant component of the loop was the DAQ acquisition of the received signal, which averaged ∼615 ms, with occasional jitter due to buffering. With a satellite at 550 km altitude on the horizon, the distance to the ground station is about 2704 km, which represents a latency of 9 ms for the laser beam to reach the satellite from the ground. Taken together, the end-to-end loop averaged 1035 ms (1026 ms + 9 ms) per trial, consistent with the logged total step times. These results indicate that the RL policy itself introduces a minimal computational burden; the bottleneck lies in the acquisition pipeline rather than in inference or actuation. In a practical satellite-to-ground scenario, the beacon intensity can be averaged and broadcast to the ground station over an RF feedback channel in parallel with optical transmission. Given the current computing power (Table 3) and DAQ used in this experiment, the RL model achieves convergence within approximately 200 trials, which corresponds to 207 s (3.45 min) of operation when accounting for the 1035 ms loop time existing within the experiment. Replacing the high-latency laboratory DAQ interface with a faster sampling-rate alternative, together with a low-latency detector front-end and parallel RF telemetry, would eliminate the dominant bottleneck and reduce the overall loop delay by more than an order of magnitude. This is because the additional round-trip delay introduced by RF broadcast in LEO systems is modest—typically, only 30–50 ms according to Darwish et al. [35].

Importantly, with the measured loop time of 1035 ms per trial, the agent converges in ∼200 trials, i.e., 3.45 min, which is already within the typical 5–10 min contact window for an LEO satellite over a ground-station pass [36,37]. Assuming a DAQ with a higher data rate and a reduction in its acquisition delay by a factor of 10, the model would now converge in approximately 1.6 min. A projected 1.6 min convergence time lies within the practical duration of an LEO ground pass. Thus, while the current laboratory setup already meets pass-duration constraints, a DAQ with a higher data-rate alternative would provide a comfortable margin and free more of the contact window for data downlink.

2.2. RL Model Hyperparameters

Table 2 reports the complete TD3 configuration used in our experiments. Unless otherwise noted, hyperparameters follow the Stable-Baselines3 (SB3) defaults for TD3 with MlpPolicy. The actor and critic share the same multi-layer perceptron (MLP) architecture. Exploration during data collection uses zero-mean Gaussian action noise with per-dimension standard deviation of

σ = 0.5

. The replay buffer is uniform and off-policy. We selected the multi-layer perceptron (MLP) policy because the observation vector in our setup is low-dimensional (14 Zernike coefficients and turbulence estimate;

C_{n}^{2}

) and does not contain spatial/temporal correlations that would benefit from convolutional neural networks (CNNs) or recurrent neural networks (RNNs). In continuous control tasks with compact state/action spaces, MLPs have been shown to be both efficient and robust [38,39]. We adopted learning rates of

3 \times 10^{- 4}

for both the actor and critic, consistent with the values recommended in [39]. This keeps a balance between convergence stability and training speed. Using the Adam optimizer is also standard, as it handles noisy gradient updates common in RL.

Training and control were run on a CPU only on a laptop. The computing power used to implement this experiment is summarized in Table 3.

The TD3 checkpoints (actor/critic weights), training logs, and per-trial DAQ CSV files have been archived and are publicly available at https://doi.org/10.21227/ekcp-dr66 (accessed on 15 September 2025), enabling independent reproduction and validation of the reported results.

3. Methodology

This study investigates adaptive optical beam shaping for FSO links under turbulence using two methods: RL-based optimization and replay pattern (RP)-based evaluation. RL iteratively tunes Zernike phase parameters on a spatial light modulator (SLM) to maximize received power, while RP reapplies high-performance, low-fluctuation patterns from prior RL runs to assess stability outside the learning loop. The following subsections outline each method’s configuration and execution.

3.1. RL Training

The adaptive optical beam shaping problem is framed as an RL task in which an agent learns to control the spatial phase profile applied to a spatial light modulator (SLM). The objective is to enhance the received optical signal by dynamically tuning Zernike phase parameters. The learning environment is constructed using the OpenAI Gym framework and directly interfaces with the physical optical setup.

The agent’s action space consists of a 14-dimensional vector of Zernike coefficients:

Z = [z_{1}, z_{2}, \dots, z_{14}] \in {[- 5, 5]}^{14},

(1)

In (1), each

z_{i}

denotes the amplitude of a particular Zernike polynomial mode. These coefficients are mapped to a continuous phase mask by computing a weighted sum over the corresponding polynomials (

Z_{n}^{m} (ρ, θ)

) in normalized polar coordinates

(ρ, θ)

across the aperture of the SLM. The complete phase pattern (

ϕ (x, y)

) is defined as follows:

ϕ (x, y) = \sum_{i = 1}^{14} z_{i} \cdot Z_{i} (ρ, θ),

(2)

Equation (2) includes polynomial indexing following Noll’s convention. This phase map is uploaded to the SLM; then, the resulting beam is received, and its intensity is recorded by a photodetector. The received signal serves as feedback to the RL agent.

To evaluate the agent’s performance, a scalar reward is computed by comparing the average received signal intensity (

I_{mean}

) against a flat-phase reference intensity (

I_{ref}

):

r = 100 \cdot (I_{mean} - I_{ref}) + B,

(3)

In (3), B is a discrete bonus term of up to 150 awarded in tiers when the intensity improvement exceeds 20%, 25%, and 30% thresholds.

Additionally, the separate beacon channel operating at 660 nm is used to estimate atmospheric turbulence during each trial. The scintillation index, derived from intensity fluctuations in the beacon channel, yields the Rytov variance (

σ_{R}^{2}

), which is used to compute the refractive index structure constant (

C_{n}^{2}

) as follows:

C_{n}^{2} = \frac{σ_{R}^{2}}{1.23 k^{7 / 6} L_{p}^{11 / 6}},

(4)

In (4),

k = 2 π / λ

is the optical wavenumber, and

L_{p}

is the propagation path length of 77 cm. This turbulence estimate is appended to the agent’s observation vector.

The complete observation vector provided to the agent is expressed as follows:

O = [z_{1}, z_{2}, \dots, z_{14}, C_{n}^{2}],

(5)

Equation (5) combines the current beam shaping parameters with an estimate of environmental turbulence. Training is carried out using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm (Algorithm 1) [39], which balances stability and exploration through dual Q-networks and target delays. The agent improves its policy iteratively based on the feedback from the optical setup, with intermediate model checkpoints and performance logs saved during the training sequence.

Algorithm 1 RL-Based Adaptive Beam Shaping With Beacon Feedback

1:: Initialize Zernike vector $Z \leftarrow 0$
2:: Measure baseline intensity $I_{ref}$ with flat-phase SLM
3:: Initialize TD3 agent with exploration noise
4:: for each training episode $t = 1$ to $T = 1000$ do
5:: Sample action $Z \sim π_{θ} (O_{t}) + noise$
6:: Generate phase mask $ϕ (x, y) = \sum_{i = 1}^{14} z_{i} Z_{i} (ρ, θ)$
7:: Display $ϕ (x, y)$ on SLM
8:: Acquire received signal and beacon signal via DAQ
9:: Compute $I_{mean}$ and estimate $C_{n}^{2}$
10:: Compute reward $r \leftarrow 100 \cdot (I_{mean} - I_{ref}) + B$
11:: Construct next state $O_{t + 1} = [Z, C_{n}^{2}]$
12:: Store transition $(O_{t}, Z, r, O_{t + 1})$ in replay buffer
13:: Update TD3 agent from minibatch
14:: Save trial output (pattern parameters, reward, $C_{n}^{2}$ )
15:: end for
16:: Save trained policy

3.2. Replay-Based Evaluation of Low-Fluctuation, High-Reward Phase Patterns

To evaluate the reproducibility and stability of beam shaping performance using previously optimized SLM phase maps, we performed a replay-based experiment using parameter sets derived from TD3 training. From the training log, Zernike polynomial coefficients were first filtered to retain only trials with a reward greater than or equal to 90% of the maximum recorded value. For each of these high-reward trials, the corresponding recorded DAQ data from the original run was analyzed, and only the samples where the received optical signal exceeded a 0.1 V threshold (“ones”) were considered. The mean, standard deviation and coefficient of variation of these “ones” were computed, and the trials were ranked by their standard deviation. The 10 lowest standard-deviation trials were selected as the most stable high-reward patterns for replay.

Each of the 10 selected patterns was replayed 20 times, for a total of 200 replay events. For each replay, the corresponding Zernike coefficients were used to generate a phase map, which was displayed on the SLM. The received optical signal was then recorded from a USB DAQ sampling at 100 kS/s for a duration of 0.3 s, and the resulting data were saved. The mean voltage, standard deviation, and coefficient of variation of the “ones” samples were calculated for each replay and logged, together with the pattern ID, timestamp, and Zernike parameters. This replay analysis quantifies the temporal robustness of previously learned, low-fluctuation, high-reward beam configurations and assesses their stability when reapplied outside the reinforcement learning loop.

4. Results and Discussion

To assess the effect of turbulence and beacon feedback on learning dynamics and signal robustness, we evaluate the RL agent across three turbulence regimes: low (72 °F, high-speed fan: 3100 RPM), moderate (380 °F, medium-speed fan: 2400 RPM), and high (460 °F, high-speed fan: 3100 RPM). The selection of fan speeds was guided by the dual requirement of inducing sufficient airflow to generate turbulence and also maintaining thermal stability at the desired operating temperatures. The corresponding refractive index structure parameters (

C_{n}^{2}

) for these conditions are

4.50 \times 10^{- 15} m^{- 2 / 3}

,

2.13 \times 10^{- 12} m^{- 2 / 3}

, and

5.73 \times 10^{- 12} m^{- 2 / 3}

, indicating increasing levels of optical turbulence for the propagation path length (

L_{p}

) of 77 cm (turbulence chamber length). For each condition, performance is evaluated based on the average received voltage and the coefficient of variation (CV). To enable a fair comparison between configurations, we focus on the exploitation phase (trials 200–999), during which the agent relies primarily on accumulated experience to refine its policy. In this phase, higher mean voltage values indicate more efficient beam coupling at the receiver, while lower CVs reflect improved link stability in the presence of turbulence.

To analyze the statistical behavior of received optical signals across different trials, we processed the DAQ-acquired data from each experiment. Signals were recorded across 1000 individual trials and saved as CSV files. Each CSV file contains voltage measurements from a single trial of 0.3 s.

For each trial, we applied a threshold of 0.1 V to isolate logical ‘1’ samples in the OOK modulation scheme. All voltage values exceeding this threshold were retained and analyzed. For every trial (i), the mean (

μ_{i}

) and standard deviation (

σ_{i}

) of the logical ‘1’ voltage samples were computed as follows:

μ_{i} = \frac{1}{N_{i}} \sum_{j = 1}^{N_{i}} x_{i j},

(6)

σ_{i} = \sqrt{\frac{1}{N_{i} - 1} \sum_{j = 1}^{N_{i}} {(x_{i j} - μ_{i})}^{2}},

(7)

In (6) and (7),

x_{i j}

denotes the j-th logical ‘1’ sample in trial i, and

N_{i}

is the number of such samples.

The results are visualized as two-column subplots across six experimental conditions in Figure 3. The left column plots display the mean voltage values (

μ_{i}

) per trial, while the right column shows the corresponding standard deviations (

σ_{i}

). Each subplot corresponds to one experimental setup (defined by temperature and feedback condition), and the x-axis represents the trial index. The y-axes are labeled accordingly to reflect the statistical quantity being plotted. This visualization enables a direct comparison of signal consistency and stability across different operating conditions. The

μ_{i}

reflects the average intensity of the received logical ‘1’ signal in trial i, serving as an indicator of beam-shaping effectiveness, while

σ_{i}

captures the trial-specific temporal fluctuations around that mean, quantifying the variability in received signal strength.

4.1. RL Training Analysis

4.1.1. Block-Level Analysis of Logical ‘1’ Signal Stability

To evaluate the statistical stability of received signal samples classified as logical ‘ones’ in the OOK modulation scheme, block-based metrics were computed using recorded voltage data under various thermal and feedback conditions (see Table 4 and Table 5 and Figure 3). The analysis was restricted to non-overlapping blocks of

N = 100

trials, starting from the 200th trial to exclude the initial learning phase. Within each trial (i), the mean voltage of logical ‘1’ samples was computed as (6). For each block (B), the block-level average signal strength was computed by averaging the per-trial means:

μ_{b} = \frac{1}{| B |} \sum_{i \in b} μ_{i},

(8)

In (8),

| B | = 100

is the number of trials in the block and b is a block index from 3 to 10 representing non-overlapping blocks of 100 trials in consecutive order from 200 to 999. This quantity represents the average signal level across the block. Voltage measurements were converted to received optical power using the Thorlabs detector specifications:

P (W) = \frac{V}{G \cdot R} .

(9)

In (9),

R = 0.2 A / W

is the detector responsivity at the operating wavelength and

G = 10^{4} V / A

is the transimpedance gain. We used

μ_{b}

values to obtain the average Rx power from (9) for each block. To assess the relative variability in signal level across trials within each block, the standard deviation of the per-trial means was computed as follows:

σ_{μ_{b}} = \sqrt{\frac{1}{| B | - 1} \sum_{i \in b} {(μ_{i} - μ_{b})}^{2}} .

(10)

The block-level coefficient of variation (CV) was then defined as follows:

{BlockCV}_{b} = \frac{σ_{μ_{b}}}{μ_{b}} \times 100 % .

(11)

Equation (11) quantifies the normalized fluctuations in signal level across trials within each 100-trial segment. A lower

{BlockCV}_{b}

indicates more consistent signal strength and, thus, higher stability under the associated experimental conditions. Table 4 and Table 5 summarize the average received power and the block-level coefficient of variation (CV) for logical ‘ones’ under varying thermal conditions and feedback configurations. These metrics reveal both the central tendency and trial-to-trial consistency of signal reception.

At 72 °F, the beacon-enabled configuration exhibits slightly lower power voltages than the no-feedback case and demonstrates higher CV values across most blocks. At 380 °F, the comparison follows a similar trend. The average power remains slightly lower in the feedback-enabled configuration across nearly all blocks. However, the block-level CV values are higher in the feedback case in most windows (e.g., 6.59% vs. 5.70% in Blocks 300–399 and 7.37% vs. 5.33% in Blocks 800–899), which reflects greater fluctuations. The reason may lie in the partially stratified turbulence regime caused by activating only one internal heating element in the chamber under moderate turbulence. This asymmetry likely introduced non-uniform and dynamic refractive gradients, producing unstable beam paths in which the feedback beacon introduced fluctuations as it continuously adapted to evolving turbulence patterns. At 460 °F, the beacon-based method achieved higher or similar average power in all blocks. Interestingly, the feedback configuration consistently achieves lower CV values during all blocks. This indicates that beacon feedback played a pivotal role in reducing volatility during the exposure to intense turbulence.

In summary, the impact of beacon feedback on signal stability (as measured by CV) is most visible under severe turbulence conditions (e.g., 460 °F), where it consistently reduces fluctuations across all blocks. In moderate, partially stratified turbulence (380 °F), the dynamic realignment process can introduce some variability, yet the overall signal remains comparably strong. Under low turbulence (72 °F), the feedback mechanism does not provide a clear advantage, likely due to the already stable channel conditions. Nonetheless, the results demonstrate that beacon feedback is a valuable and adaptive tool for enhancing robustness under intense-turbulence FSO communication scenarios. These results can also be validated in the behavior of the signal by looking at the first column of Figure 3.

4.1.2. Statistical Analysis of Temporal Signal Stability

As discussed in Section 4.1.1, we previously examined trial-to-trial fluctuations. However, we did not analyze the temporal variability within each 0.3 s trial window. In this section, we address this aspect by evaluating the temporal stability of the received signal under varying thermal conditions and feedback control. More specifically we are analyzing the temporal behavior observed in the second column of Figure 3. We computed the coefficient of variation (CV) for voltages classified as logical ‘ones’ in the OOK modulation scheme on a per-trial basis. For each trial (i) beyond the 200th, the

{CV}_{i}

was defined as follows:

{CV}_{i} = \frac{σ_{i}}{μ_{i}} \times 100 %,

(12)

In (12),

μ_{i}

is the mean and

σ_{i}

is the standard deviation of the logical ‘1’ samples in trial (i). These were computed as (6) and (7), respectively. Thus,

{CV}_{i}

provides a scale-invariant measure of noise fluctuations relative to the signal level.

To characterize the distribution of signal stability across all trials beyond the 200th, we calculated several statistical descriptors of the per-trial basis; the mean CV, which reflects the average noise fluctuation normalized to signal strength across trials; the median CV, which represents the midpoint of the CV distribution and is less influenced by extreme values; the interquartile range (IQR), which quantifies the spread of the middle 50% of CVs and indicates distribution consistency; and the total range, which captures the difference between the minimum and maximum CV values observed, reflecting the full extent of trial-to-trial variability.

Mean_CV = \sqrt{\frac{1}{M} \sum_{i \in M} ({CV}_{i})} .

(13)

In (13), i is the trial index in a subsection of trials (M) beyond trial 200. Two visualization techniques were used to interpret the CV distributions: histograms and boxplots. Figure 4 demonstrates the frequency distribution of per-trial CVs across all valid trials for each condition, providing insight into skewness, modality, and spread. Figure 5 and Table 6 summarize the distribution of CVs using the median, interquartile range (IQR), and total range, highlighting differences between temperatures and feedback conditions.

The results indicate that beacon feedback enhances signal stability, particularly at lower and intense temperatures compared to moderate temperatures. At 72 °F, the median CV drops from 3.60% to 2.34% with feedback, suggesting more consistent signal quality. The histogram in Figure 4 further supports this, showing tighter clustering near lower CV values. At elevated temperatures (380 °F and 460 °F), although the median and mean CVs between feedback and no-feedback cases are comparable, the range of variability is reduced under feedback—most notably, at 460 °F, where the CV range contracts from 14.45% to 8.52%. This indicates that beacon feedback effectively suppresses outlier behavior and ensures more uniform performance, even under high thermal stress. Moreover, at 460 °F, the histogram of the feedback-based method demonstrates higher congestion near the lower range of CV values.

In conclusion, the temporal analysis reveals that beacon feedback contributes to improved signal consistency across varying turbulence intensities. While the most substantial reduction in median CV is observed under low turbulence (72 °F), the primary benefit at intense temperatures lies in suppressing extreme fluctuations and outliers. At 460 °F, the beacon-based configuration not only narrows the CV range considerably but also exhibits greater clustering around lower CV values, as seen in the corresponding histogram. The raw standard deviation plots of the second column in Figure 3 also acknowledge this finding. These analyses affirm that beacon feedback enhances the temporal robustness of the received signal, particularly by reducing temporal volatility and outlier behavior in challenging thermal environments.

4.1.3. Retrieved Power Quantification

In all experiments, the transmitted optical power was fixed at

1 mW

, and the total power received on the SLM display was measured with a fiberoptic tester as

263 μ W

, representing a

5.80 dB

loss. This loss is attributed to reflections from the two lenses in front of the laser source, which redirected part of the power in other directions within the lab setup. With the SLM display set to a flat phase, under baseline conditions of 72 °F, 380 °F, and 460 °F,

62.45 μ W

,

64.3 μ W

, and

68 μ W

power was received, respectively. The increasing trend is associated with glass deformation on the two sides of the turbulence box. These baseline measurements are used as the reference for power compensation analysis across all turbulence scenarios. The corresponding free-space optical path loss is then given by (14).

L o s s (dB) = 10 {log}_{10} (\frac{P_{T x}}{P_{R x}}),

(14)

Using (14) for

P_{T x} = 1 mW

results in

12.04 dB

,

11.92 dB

, and

11.67 dB

total loss under baseline conditions. Considering the

5.80 dB

loss calculated earlier, the algorithm will try to recover the remaining losses of

6.24 dB

,

6.12 dB

, and

5.87 dB

under each channel condition. The following analysis quantifies the portion of this baseline loss that can be recovered through RL-based adaptive optimization. In order to calculate the recovered power, we used (15).

R e c o v e r e d P o w e r (dB) = 10 {log}_{10} (\frac{P_{R x}}{P_{R e f}}),

(15)

As we previously calculated

P_{Ref}

for each channel condition by using (15), we can determine the recovered power. The obtained power values are reported in Table 7.

Under baseline turbulence conditions (72 °F, 380 °F, and 460 °F), the reinforcement learning framework consistently recovered between

1.4

and

2.25 dB

of the excess loss, translating to a ∼1.4–1.7× increase in detected optical power. In practical terms, this means that nearly one third of the otherwise unrecoverable distortion was adaptively compensated in real time, even within a short-path lab setup where aberrations dominate over scintillation. These results demonstrate the potential of a learning-based controller that continuously senses the evolving channel and actively reshapes the wavefront to maintain signal quality. While the present study validates this concept under controlled laboratory turbulence, the same principles can be extended to longer transmission distances and more severe atmospheric conditions.

4.1.4. Bit Error Rate Analysis

Figure 6 illustrates the Bit Error Rate (BER) versus Optical Signal-to-Noise Ratio (OSNR) characteristics across three thermal turbulence regimes—72 °F (low), 380 °F (moderate), and 460 °F (high)—with and without beacon feedback. For each setting, raw BER values are plotted alongside a fitted Cauchy distribution to approximate the system’s error response and evaluate the impact of beacon feedback on signal fidelity and robustness. At 72 °F, the two configurations perform similarly, exhibiting sharp BER decay as OSNR increases. This suggests that under low turbulence, the communication channel remains relatively stable and the contribution of beacon feedback is marginal. In contrast, under moderate and high turbulence, beacon feedback offers a sharper slope in reducing BER across the high-range OSNR values (>14 dB), implying more effective compensation for wavefront distortions. These observations underscore that beacon feedback can enhance performance in turbulent conditions and reinforces the need for adaptive feedback strategies tailored to channel conditions.

Together, these results underscore the advantage of beacon-based feedback in enabling both stronger and more stable optical links, particularly in dynamic and degraded channel environments. The improvements observed in BER characteristics align with the trends in signal strength and variability reported in Table 4 and Table 6.

4.2. Replay Pattern Analysis

Figure 7 displays these statistics for replayed patterns (RPs) obtained from the top-performing trials of the reinforcement learning data. The left-column subplots present

μ_{i}

over 200 trials, while the right-column subplots show the corresponding

σ_{i}

values as defined in (6) and (7). We will analyze these further in the following subsections. To clarify, in the following section, where we talk about the RPs from beacon-assisted data or data collected with no beacon, we are referring just to the source of the visualized pattern on the SLM device. There is no feedback used in this section.

4.2.1. Block-Level Analysis of Logical ‘1’ Signal Stability

The replay experiments, as summarized in Table 8 and Table 9, evaluate the performance of selected replay patterns under turbulence conditions matched to RL training. Each table reports results in two consecutive non-overlapping 100-trial windows. Table 8 shows the average received power (computed using (8) and (9)) for logical ‘ones’ during replay. Across all conditions, RPs replayed from no-beacon training patterns yield slightly higher mean power than those from beacon-assisted patterns. This difference is consistent across 72 °F, 380 °F, and 460 °F conditions. Table 9 reports the block-level coefficient of variation (

Block_CV

, computed using (11)) for the same replay trials. In this case, the results diverge from the average power trends. At 72 °F and 380 °F, RPs from beacon-based data consistently exhibit higher CV values than no-beacon RPs, indicating greater trial-to-trial variability in received signal strength despite comparable average levels. At 460 °F, the RPs from beacon-assisted data maintain lower CVs than their no-beacon counterparts, suggesting improved block-level stability under stronger turbulence.

These observations imply that, while RP data from beacon-assisted data does not always maximize replayed signal power, it can provide stability advantages under severe turbulence (460 °F), even if the feedback is not operating. Conversely, in lower and moderately non-uniform turbulence regimes (72 °F and 380 °F), no-beacon RPs maintain lower variability, potentially due to the replay setting lacking active feedback to exploit the beacon’s advantages.

4.2.2. Statistical Analysis of Temporal Signal Stability

The coefficient of variation (CV) distributions for replay pattern (RP) trials (calculated similar to Section 4.1.2), shown in Figure 8 and summarized in Figure 9, illustrate how turbulence-induced temporal fluctuations depend on whether the RP originated from a policy trained with or without beacon feedback.

At 72 °F, both RP types exhibit narrow CV distributions concentrated between 1 and 5%, with the no-beacon RP showing a slightly higher proportion of very low-CV trials (<2.5%). The median CV values from the boxplots (Figure 9) confirm the similarity. At 380 °F, where turbulence is partially stratified due to one active internal heating element, the no-beacon-trained RP shows a broader CV distribution, with a slightly heavier tail extending beyond 15% for the CV, indicating occasional large fluctuations in received signal stability. In contrast, the RP from beacon-trained data yields a more compact distribution with fewer high-CV outliers, suggesting that the beam-shaping patterns learned under beacon feedback retain robustness when replayed without active feedback. At 460 °F, the no-beacon-trained RP presents a larger spread, with more outliers exceeding 15%. Similar to 380 °F, the RP from beacon-trained data maintains a reduced upper-tail spread, indicating that training with feedback still imparts temporal stability benefits, even when feedback is absent during operation.

Overall, these results suggest that policies trained with beacon feedback can yield replay patterns that are more resilient to turbulence-induced temporal fluctuations, particularly under nonuniform or severe conditions, despite operating without active feedback during playback.

4.3. RL Training and RP Comparison

By comparing the results of Section 4.1 and Section 4.2, the criteria used for parameter selection of the RPs resulted in consistently lower Block_CV values under all turbulence conditions compared to the RL training runs, as summarized in Table 5 and Table 9, respectively. The average voltage levels of logical ones are comparable in most cases, except for 380 °F, where the values are higher for the RPs. This outcome demonstrates the effectiveness of the ninety-percent-of-maximum-recorded-power criterion used for selecting the RP parameters.

A comparison between Table 6 for RL training and the corresponding RP statistics in Table 10 reveals distinct differences in the temporal stability of the received logical ‘1’ signals. Across all turbulence conditions, the interquartile ranges (IQRs) are generally lower for the RPs visualized from the with-beacon data, indicating greater variability across trials and the effectiveness of the pattern selection strategy to limit temporal variability for the middle 50% of the dataset. Despite this, the mean CV and the total range for the RPs are higher in some cases, which implies the role of beacon and power feedback in preventing signal fluctuations in the RL training method. These results suggest that while RP-based operation provides benefits in terms of Block_CV compared to RL training (Table 5 and Table 9), it may allow for greater variability in temporal signal stability.

5. Conclusions

This study examined RL-based intensity optimization for FSO uplinks under varying turbulence levels, with emphasis on beacon-assisted training and RP evaluation. Results showed that beacon feedback offered the greatest benefit in severe turbulence, consistently reducing both temporal and block-level CV and suppressing extreme fluctuations. Under low turbulence, its effect was minimal, while in moderate, partially stratified turbulence, the feedback mechanism occasionally introduced variability due to dynamic realignment in non-uniform flow. Importantly, the RL agent demonstrated convergence in approximately 3.45 min under the measured laboratory loop latency. This convergence time lies within the typical 5–10 min contact duration of an LEO satellite ground pass, underscoring the potential feasibility of the approach when implemented with a faster DAQ and detector. Replay patterns—selected from the top 10% of high-intensity RL trials based on stability criteria—achieved block-level stability comparable or superior to that of active RL in several cases, even without active feedback.

Overall, the findings confirm that integrating contextual turbulence feedback into RL-based beam shaping can enhance FSO link stability in challenging conditions, while selective replay offers a practical fallback when feedback is unavailable. Future work will extend this approach to longer propagation distances and hybrid feedback strategies incorporating wavefront sensing.

Author Contributions

Methodology, E.S.; Software, E.S.; Resources, P.L.; Writing—original draft, E.S.; Writing—review & editing, P.L.; Visualization, E.S.; Supervision, P.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are openly available in the IEEE DataPort at https://dx.doi.org/10.21227/ekcp-dr66 (accessed on 15 September 2025).

Acknowledgments

The authors acknowledge the support of the University of Tulsa Faculty Research Grant Program, which provided funding for the spatial light modulator used in this study. Additional support was provided by the University of Tulsa Graduate Summer Fellowship.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Xu, Z.; Xu, G.; Zheng, Z. BER and channel capacity performance of an FSO communication system over atmospheric turbulence with different types of noise. Sensors 2021, 21, 3454. [Google Scholar] [CrossRef] [PubMed]
Nistazakis, H.E.; Karagianni, E.A.; Tsigopoulos, A.D.; Fafalios, M.E.; Tombras, G.S. Average capacity of optical wireless communication systems over atmospheric turbulence channels. J. Light. Technol. 2009, 27, 974–979. [Google Scholar] [CrossRef]
Khan, A.N.; Saeed, S.; Naeem, Y.; Zubair, M.; Massoud, Y.; Younis, U. Atmospheric turbulence and fog attenuation effects in controlled environment FSO communication links. IEEE Photonics Technol. Lett. 2022, 34, 1341–1344. [Google Scholar] [CrossRef]
Kaur, P.; Jain, V.K.; Kar, S. Performance analysis of FSO array receivers in presence of atmospheric turbulence. IEEE Photonics Technol. Lett. 2014, 26, 1165–1168. [Google Scholar]
Nath, S.; Sengar, S.; Shrivastava, S.K.; Singh, S.P. Impact of atmospheric turbulence, pointing error, and traffic pattern on the performance of cognitive hybrid FSO/RF system. IEEE Trans. Cogn. Commun. Netw. 2019, 5, 1194–1207. [Google Scholar] [CrossRef]
Elsayed, E.E. Atmospheric turbulence mitigation of MIMO-RF/FSO DWDM communication systems using advanced diversity multiplexing with hybrid N-SM/OMI M-ary spatial pulse-position modulation schemes. Opt. Commun. 2024, 562, 130558. [Google Scholar] [CrossRef]
Sandalidis, H.G.; Tsiftsis, T.A.; Karagiannidis, G.K.; Uysal, M. BER performance of FSO links over strong atmospheric turbulence channels with pointing errors. IEEE Commun. Lett. 2008, 12, 44–46. [Google Scholar] [CrossRef]
Navidpour, S.M.; Uysal, M.; Kavehrad, M. BER performance of free-space optical transmission with spatial diversity. IEEE Trans. Wirel. Commun. 2007, 6, 2813–2819. [Google Scholar] [CrossRef]
Amirabadi, M.A.; Kahaei, M.H.; Nezamalhosseni, S.A. Low complexity deep learning algorithms for compensating atmospheric turbulence in the free space optical communication system. IET Optoelectron. 2022, 16, 93–105. [Google Scholar] [CrossRef]
Song, S.; He, Q.; Liu, Y.; Wu, P.; Sun, Q.; Zhao, L.; Wu, T.; Guo, L. Personalized Federated Learning Based Adaptive Optical Compensation for Atmospheric Turbulence. J. Light. Technol. 2024, 43, 177–189. [Google Scholar] [CrossRef]
Zeng, T.; Semiari, O.; Saad, W.; Bennis, M. Wireless-enabled asynchronous federated fourier neural network for turbulence prediction in urban air mobility (UAM). IEEE Trans. Wirel. Commun. 2023, 22, 7902–7916. [Google Scholar] [CrossRef]
Li, Z.; Su, J.; Zhao, X. Atmospheric turbulence compensation with sensorless AO in OAM-FSO combining the deep learning-based demodulator. Opt. Commun. 2020, 460, 125111. [Google Scholar] [CrossRef]
Darwesh, L.; Kopeika, N.S. Deep learning for improving performance of OOK modulation over FSO turbulent channels. IEEE Access 2020, 8, 155275–155284. [Google Scholar] [CrossRef]
Martinez, A.I.; Cavicchioli, G.; Seyedinnavadeh, S.; Zanetto, F.; Sampietro, M.; D’Acierno, A.; Morichetti, F.; Melloni, A. Self-adaptive integrated photonic receiver for turbulence compensation in free space optical links. Sci. Rep. 2024, 14, 20178. [Google Scholar] [CrossRef]
Correia, V.D.; Fernandes, M.A.; Monteiro, P.P.; Guiomar, F.P.; Fernandes, G.M. On the impact and mitigation of turbulence in fiber-coupled FSO systems. IEEE Access 2024, 12, 69505–69516. [Google Scholar] [CrossRef]
Mashiko, K.; Kawamoto, Y.; Kato, N.; Yoshida, K.; Ariyoshi, M. Combined control of coverage area and HAPS deployment in hybrid FSO/RF SAGIN. IEEE Trans. Veh. Technol. 2025, 74, 10819–10828. [Google Scholar] [CrossRef]
Nguyen, T.V.; Le, H.D.; Dang, N.T.; Pham, A.T. On the design of rate adaptation for relay-assisted satellite hybrid FSO/RF systems. IEEE Photonics J. 2021, 14, 7304211. [Google Scholar] [CrossRef]
Samy, R.; Yang, H.C.; Rakia, T.; Alouini, M.S. Ergodic capacity analysis of satellite communication systems with SAG-FSO/SH-FSO/RF transmission. IEEE Photonics J. 2022, 14, 7347909. [Google Scholar] [CrossRef]
Huang, Q.; Lin, M.; Zhu, W.P.; Cheng, J.; Alouini, M.S. Uplink massive access in mixed RF/FSO satellite-aerial-terrestrial networks. IEEE Trans. Commun. 2021, 69, 2413–2426. [Google Scholar] [CrossRef]
Kaushal, H.; Kaddoum, G. Optical communication in space: Challenges and mitigation techniques. IEEE Commun. Surv. Tutor. 2016, 19, 57–96. [Google Scholar] [CrossRef]
Guiomar, F.P.; Fernandes, M.A.; Nascimento, J.L.; Rodrigues, V.; Monteiro, P.P. Coherent free-space optical communications: Opportunities and challenges. J. Light. Technol. 2022, 40, 3173–3186. [Google Scholar] [CrossRef]
Lazarev, G.; Hermerschmidt, A.; Krüger, S.; Osten, S. LCOS spatial light modulators: Trends and applications. In Optical Imaging and Metrology: Advanced Technologies; Wiley: Hoboken, NJ, USA, 2012; pp. 1–29. [Google Scholar]
McKnight, D.J.; Johnson, K.M.; Serati, R.A. 256× 256 liquid-crystal-on-silicon spatial light modulator. Appl. Opt. 1994, 33, 2775–2784. [Google Scholar] [CrossRef] [PubMed]
Johnson, K.M.; McKnight, D.J.; Underwood, I. Smart spatial light modulators using liquid crystals on silicon. IEEE J. Quantum Electron. 2002, 29, 699–714. [Google Scholar] [CrossRef]
Savage, N. Digital spatial light modulators. Nat. Photonics 2009, 3, 170–172. [Google Scholar] [CrossRef]
Rosales-Guzmán, C.; Forbes, A. Structured Light with Spatial Light Modulators; SPIE: Bellingham, WA, USA, 2024. [Google Scholar]
Pachava, S.; Dharmavarapu, R.; Vijayakumar, A.; Jayakumar, S.; Manthalkar, A.; Dixit, A.; Viswanathan, N.K.; Srinivasan, B.; Bhattacharya, S. Generation and decomposition of scalar and vector modes carrying orbital angular momentum: A review. Opt. Eng. 2020, 59, 041205. [Google Scholar] [CrossRef]
Zepp, A.; Gladysz, S.; Stein, K.; Osten, W. Simulation-based design optimization of the holographic wavefront sensor in closed-loop adaptive optics. Light Adv. Manuf. 2022, 3, 384–399. [Google Scholar] [CrossRef]
Inoue, T.; Tanaka, H.; Fukuchi, N.; Takumi, M.; Matsumoto, N.; Hara, T.; Yoshida, N.; Igasaki, Y.; Kobayashi, Y. LCOS spatial light modulator controlled by 12-bit signals for optical phase-only modulation. In Emerging Liquid Crystal Technologies II; SPIE: Bellingham, WA, USA, 2007; Volume 6487, pp. 212–222. [Google Scholar]
Jian, Y.H.; Wang, C.C.; Chow, C.W.; Gunawan, W.H.; Wei, T.C.; Liu, Y.; Yeh, C.H. Optical beam steerable orthogonal frequency division multiplexing (OFDM) non-orthogonal multiple access (NOMA) visible light communication using spatial-light modulator based reconfigurable intelligent surface. IEEE Photonics J. 2023, 15, 7303408. [Google Scholar] [CrossRef]
Benea-Chelmus, I.C.; Meretska, M.L.; Elder, D.L.; Tamagnone, M.; Dalton, L.R.; Capasso, F. Electro-optic spatial light modulator from an engineered organic layer. Nat. Commun. 2021, 12, 5928. [Google Scholar] [CrossRef]
Kang, C.; Seo, J.; Jang, I.; Chung, H. Adjoint method-based Fourier neural operator surrogate solver for wavefront shaping in tunable metasurfaces. iScience 2025, 28, 111545. [Google Scholar] [CrossRef]
Galaktionov, I.; Nikitin, A.; Sheldakova, J.; Toporovsky, V.; Kudryashov, A. Focusing of a laser beam passed through a moderately scattering medium using phase-only spatial light modulator. Photonics 2022, 9, 296. [Google Scholar] [CrossRef]
Galaktionov, I.; Sheldakova, J.; Nikitin, A.; Toporovsky, V.; Kudryashov, A. A hybrid model for analysis of laser beam distortions using Monte Carlo and Shack–Hartmann techniques: Numerical study and experimental results. Algorithms 2023, 16, 337. [Google Scholar] [CrossRef]
Darwish, T.; Kurt, G.K.; Yanikomeroglu, H.; Bellemare, M.; Lamontagne, G. LEO satellites in 5G and beyond networks: A review from a standardization perspective. IEEE Access 2022, 10, 35040–35060. [Google Scholar] [CrossRef]
Abele, E.; Altunc, S.; Kegege, O.; Azimi, B.; Lynaugh, K.; Ekin, S.; O’Hara, J. S-band Network Analysis and Strategies for LEO Multi-CubeSat Science Missions. In Proceedings of the 2022 IEEE Aerospace Conference (AERO), Big Sky, MT, USA, 5–12 March 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–10. [Google Scholar]
Vasisht, D.; Chandra, R. A distributed and hybrid ground station network for low earth orbit satellites. In Proceedings of the 19th ACM Workshop on Hot Topics in Networks, Virtual, 4–6 November 2020; pp. 190–196. [Google Scholar]
Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2015, arXiv:1509.02971. [Google Scholar]
Fujimoto, S.; Hoof, H.; Meger, D. Addressing function approximation error in actor-critic methods. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 1587–1596. [Google Scholar]

Figure 1. Schematic of the experimental setup for real-time beam shaping with reinforcement learning control under simulated turbulence.

Figure 2. Experimental setup for an FSO link passing through a custom-built turbulence chamber.

Figure 3. Mean voltage (left column) and standard deviation (right column) of logical ‘ones’ per trial in RL training.

Figure 4. Histogram of the per-trial coefficient of variation (CV) of logical ‘1’ voltages under different temperature and feedback conditions.

Figure 5. Boxplot of the coefficient of variation (CV) of the logical ‘1’ voltages under different temperature and feedback conditions.

Figure 6. BER versus OSNR curves.

Figure 7. Mean voltage (left column) and standard deviation (right column) of logical ‘ones’ per trial in RPs.

Figure 8. Histogram of the per-trial coefficient of variation (CV) of logical ‘1’ voltages under different temperatures for replay patterns (RPs).

Figure 9. Boxplot of coefficient of variation (CV) of the logical ‘1’ voltages under different temperature and feedback conditions.

Table 1. Latency statistics of the RL–SLM control loop per trial with beacon processing (ms).

Component	Mean	Std	Median	Q1	Q3	IQR
Phase generation	362.935	111.818	339.655	317.480	370.768	53.287
SLM API call	48.033	17.067	45.046	40.158	50.572	10.414
DAQ acquisition	615.030	141.980	584.844	564.037	616.089	52.052
Processing	0.011	0.021	0.009	0.007	0.010	0.003
RL inference	1.164	1.079	1.063	0.716	1.303	0.588
Total step	1026.017	240.246	967.894	931.987	1020.394	88.408

Table 2. TD3 hyperparameters and policy architecture.

Item	Value/Setting
Policy (SB3)	`MlpPolicy` (actor & critic MLPs)
Network architecture	Two hidden layers: [400,300] units (ReLU activations)
Action space	$R^{14}$ (Zernike coefficients), elementwise bounds $[- 5, 5]$
Observation	Current action (14-dim)
Optimizer	Adam (actor and critic)
Learning rate	$1 \times 10^{- 3}$ (actor and critic)
Discount factor ( $γ$ )	0.99
Target update rate ( $τ$ )	0.005
Replay buffer size	1,000,000 transitions
Batch size	100
Training frequency and gradient steps	1 step, 1 gradient step (SB3 default for off-policy)
Policy delay	2 (critic updated every step; actor/target updated every 2 steps)
Target policy smoothing	Target noise std = 0.2; noise clip = 0.5
Exploration noise	Gaussian $N (0, 0 . 5^{2})$ per action dim (during training)

Table 3. Training hardware platform.

Item	Specification/Measurement
CPU	Intel^® Core^TM i5-5300U @ 2.30 GHz (2 cores/4 threads)
Memory	8 GB RAM
GPU	None (CPU-only)
OS	64-bit Windows (x64)

Table 4. Average Rx power (

μ W

) of logical ones in successive 100-trial windows.

Table 4. Average Rx power (

μ W

) of logical ones in successive 100-trial windows.

Run Label	200–299	300–399	400–499	500–599	600–699	700–799	800–899	900–999
72 °F, high-fan, With Beacon	100.75	100.60	99.90	99.55	99.90	100.65	99.50	101.05
72 °F, high-fan, No Beacon	103.20	103.50	103.90	103.90	104.85	103.40	103.45	103.05
380 °F, med-fan, With Beacon	96.08	96.90	96.85	96.45	96.85	97.30	95.50	96.60
380 °F, med-fan, No Beacon	89.05	98.15	98.45	99.20	99.05	98.45	98.25	99.20
460 °F, high-fan, With Beacon	104.30	109.55	109.65	110.15	109.35	108.85	109.40	109.25
460 °F, high-fan, No Beacon	100.10	101.55	104.35	104.90	108.95	110.25	109.50	109.35

Table 5. Block-level coefficient of variation (BlockCV %) of logical ‘ones’ in 100-trial windows.

Run Label	200–299	300–399	400–499	500–599	600–699	700–799	800–899	900–999
72 °F, high-fan, With Beacon	4.63	4.81	4.96	6.56	5.60	4.67	6.21	4.66
72 °F, high-fan, No Beacon	5.66	4.40	4.06	3.51	3.73	4.53	4.24	3.96
380 °F, med-fan, With Beacon	6.35	6.59	5.20	5.96	6.19	5.39	7.37	5.63
380 °F, med-fan, No Beacon	8.18	5.70	4.77	4.15	4.19	5.49	5.33	4.19
460 °F, high-fan, With Beacon	6.16	4.20	3.55	3.50	4.05	4.00	3.75	3.37
460 °F, high-fan, No Beacon	6.89	5.71	4.91	6.04	4.59	4.33	4.49	3.98

Table 6. Statistical summary of temporal coefficient of variation (CV) for logical ‘ones’ beyond trial 200.

Condition	Median CV (%)	IQR (%)	Mean CV (%)	Total Range for CV (%)
72 °F, With Beacon	2.34	4.55	4.14	14.40
72 °F, No Beacon	3.60	4.33	4.80	14.59
380 °F, With Beacon	4.76	4.77	5.47	11.94
380 °F, No Beacon	4.53	4.41	5.41	13.04
460 °F, With Beacon	4.37	5.19	5.81	8.52
460 °F, No Beacon	4.00	3.86	5.32	14.45

Table 7. SLM-based recovered power in Rx for Successive 100-trial windows (dB).

Run Label	200–299	300–399	400–499	500–599	600–699	700–799	800–899	900–999
72 °F, high-fan, With Beacon	2.08	2.07	2.04	2.03	2.04	2.07	2.03	2.09
72 °F, high-fan, No Beacon	2.18	2.19	2.21	2.21	2.25	2.19	2.19	2.18
380 °F, med-fan, With Beacon	1.74	1.78	1.78	1.76	1.78	1.80	1.72	1.77
380 °F, med-fan, No Beacon	1.41	1.84	1.85	1.88	1.88	1.85	1.84	1.88
460 °F, high-fan, With Beacon	1.86	2.07	2.07	2.09	2.11	2.06	2.07	2.06
460 °F, high-fan, No Beacon	1.68	1.74	1.86	1.88	2.05	2.10	2.07	2.06

Table 8. Average Rx power (

μ W

) of logical ones in successive 100-trial windows for replay patterns (RPs).

Table 8. Average Rx power (

μ W

) of logical ones in successive 100-trial windows for replay patterns (RPs).

Run Label	0–99	100–199
72 °F, high-fan, RP With Beacon Data	102.80	102.35
72 °F, high-fan, RP With No Beacon Data	106.05	105.65
380 °F, med-fan, RP With Beacon Data	103.40	103.10
380 °F, med-fan, RP With No Beacon Data	105.75	105.90
460 °F, high-fan, RP With Beacon Data	107.55	107.55
460 °F, high-fan, RP With No Beacon Data	109.95	110.35

Table 9. Block-Level Coefficient of Variation (Block_CV %) of Logical ‘ones’ in successive 100-trial windows for replay patterns (RPs).

Run Label	0–99	100–199
72 °F, high-fan, RP With Beacon Data	3.09	3.33
72 °F, high-fan, RP With No Beacon Data	1.67	1.66
380 °F, med-fan, RP With Beacon Data	2.91	2.68
380 °F, med-fan, RP With No Beacon Data	2.16	2.14
460 °F, high-fan, RP With Beacon Data	2.80	2.82
460 °F, high-fan, RP With No Beacon Data	3.23	3.33

Table 10. Statistical summary of temporal coefficient of variation (CV) for logical ‘ones’ in RPs.

Condition	Median CV (%)	IQR (%)	Mean CV (%)	Total Range for CV (%)
72 °F, With Beacon	3.86	4.20	4.82	13.59
72 °F, No Beacon	3.43	4.39	4.60	13.28
380 °F, With Beacon	4.35	4.18	5.37	14.11
380 °F, No Beacon	4.72	6.52	6.10	14.28
460 °F, With Beacon	4.77	4.46	5.68	14.79
460 °F, No Beacon	4.61	6.66	6.39	16.41

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Seifi, E.; LoPresti, P. Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence. Photonics 2025, 12, 979. https://doi.org/10.3390/photonics12100979

AMA Style

Seifi E, LoPresti P. Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence. Photonics. 2025; 12(10):979. https://doi.org/10.3390/photonics12100979

Chicago/Turabian Style

Seifi, Erfan, and Peter LoPresti. 2025. "Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence" Photonics 12, no. 10: 979. https://doi.org/10.3390/photonics12100979

APA Style

Seifi, E., & LoPresti, P. (2025). Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence. Photonics, 12(10), 979. https://doi.org/10.3390/photonics12100979

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Impact of Beacon Feedback on Stabilizing RL-Based Power Optimization in SLM-Controlled FSO Uplinks Under Turbulence

Abstract

1. Introduction

2. Experimental Configuration

2.1. Latency and Feasibility Analysis

2.2. RL Model Hyperparameters

3. Methodology

3.1. RL Training

3.2. Replay-Based Evaluation of Low-Fluctuation, High-Reward Phase Patterns

4. Results and Discussion

4.1. RL Training Analysis

4.1.1. Block-Level Analysis of Logical ‘1’ Signal Stability

4.1.2. Statistical Analysis of Temporal Signal Stability

4.1.3. Retrieved Power Quantification

4.1.4. Bit Error Rate Analysis

4.2. Replay Pattern Analysis

4.2.1. Block-Level Analysis of Logical ‘1’ Signal Stability

4.2.2. Statistical Analysis of Temporal Signal Stability

4.3. RL Training and RP Comparison

5. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI