Next Article in Journal
Customized Chirality of an Optical Vortex Pair: Helical Dichroism and Enantioselective Force
Previous Article in Journal
Nonlinear Absorption Properties of Phthalocyanine-like Squaraine Dyes
 
 
Article
Peer-Review Record

Wavefront-Corrected Algorithm for Vortex Optical Transmedia Wavefront-Sensorless Sensing Based on U-Net Network

Photonics 2025, 12(8), 780; https://doi.org/10.3390/photonics12080780
by Shangjun Yang 1,2,3, Yanmin Zhao 1,2,3, Binkun Liu 1,2,3, Shuguang Zou 1,2,3,* and Chenghu Ke 4
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3:
Photonics 2025, 12(8), 780; https://doi.org/10.3390/photonics12080780
Submission received: 24 June 2025 / Revised: 20 July 2025 / Accepted: 31 July 2025 / Published: 1 August 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors In the paper titled “Wavefront-corrected algorithm for vortex optical transmedia wavefront-sensorless sensing based on U-Net Network”, the authors proposed a PID-assisted transmedia sensorless phase-correction methods based on deep learning. However, I have some queries and suggestions for this work. 1.In the introduction, the authors proposed ‘cross‐media scenarios present three critical challenges’, but only two are listed. 2.In Figure 1, the OAM communication code is illustrated, but it is not described in the paper and it is recommended that it be removed. 3.Symbol ϵ is defined repeatedly. In the Eq. (6), ϵ is considered as a function of the depth h. However, in the Eq.(11), ϵ is a small dimensionless constant. 4.In the paper, PID operation requires the introduction of a reference spiral phase, consistent with the topological charge of incident vortex beam. This is paradoxical, if the topological charge of the incident beam is already known, then wavefront recovery makes no sense 5.The author claims that ‘In the corrected spiral spectrum, the power share of the target mode (l=3) is increased from 38.4% to 51.4%’. This appears to be the result of a single measurement, which is hardly convincing. The authors should add statistical analysis results to demonstrate the effectiveness of the model.

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions corrections highlighted in the re-submitted files.Please see the attachment.

Comments 1: In the introduction, the authors proposed ‘cross‐media scenarios present three critical challenges’, but only two are listed.

Response 1: We wholeheartedly agree with the reviewer that this omission could mislead readers about the full scope of difficulties in hybrid atmospheric–oceanic links. In our original submission, we intended to highlight three distinct sources of phase aberration—atmospheric turbulence, rapid sea-surface modulation, and anisotropic ocean turbulence—but inadvertently failed to enumerate the third point. We have now corrected this in the Introduction by explicitly adding the missing challenge. Below, we reproduce the revised paragraph in full; the newly inserted third item is shown in red.

Vortex optical enables high‐capacity underwater data transmission by exploiting or-bital angular momentum (OAM) mode multiplexing [1, 2]. However, when propagating through hybrid atmospheric–oceanic channels, beams incur successive phase aberrations arising from atmospheric turbulence [3], high‐frequency stochastic modulation by dy-namic sea‐surface waves [4], and anisotropic perturbations due to ocean turbulence [5]. Conventional adaptive‐optics systems [6] can correct distortions in a single medium, but cross‐media scenarios present three critical challenges. Firstly, disparate spectral scales and energy distributions of atmospheric versus oceanic turbulence preclude accurate joint‐perturbation modeling using a single‐phase‐screen approach. Secondly, sea‐surface waves introduce phase noise in the hundreds to thousands of hertz—well beyond typical wavefront‐sensor bandwidths—preventing real‐time tracking and correction of rapid wavefront fluctuations [7]. Third, anisotropic ocean turbulence induces depth-dependent, spatially varying refractive-index perturbations throughout the water column, resulting in uneven degradation of OAM mode purity that neither standard phase-screen models nor sensor-less correction algorithms can fully mitigate [8].

This addition ensures that all three challenges are properly presented and sets the stage for our proposed hybrid correction framework, which integrates a physics-guided forward model, a data-driven neural network for residual error prediction, and a closed-loop PID controller to address each source of distortion.

Once again, we apologize for any confusion caused by this oversight and thank the reviewer for helping us improve the clarity and completeness of our manuscript. We trust that this revision satisfactorily addresses the comment.

Comments 2: In Figure 1, the OAM communication code is illustrated, but it is not described in the paper and it is recommended that it be removed.

Response 2: We thank the reviewer for highlighting this inconsistency. You are absolutely right that including the “OAM communication code” block in Figure 1 without any accompanying explanation is confusing and distracts from the core message of our hybrid wavefront-correction framework.After careful consideration, and given that the detailed design and implementation of the coding/decoding chain lie beyond the current manuscript’s scope and are reserved for a follow-on study, we have removed the entire OAM communication code element from Figure 1.

Concretely, the following changes have been made: The section formerly labeled “OAM communication code (image encode/decode chain)” has been deleted and the schematic now shows only three modules, Physics-guided forward model of combined atmospheric and oceanic phase aberrations,Data-driven neural network predicting residual phase errors, Gaussian light and vortex light subtraction correction were used. The modified figure 1 and description is shown below.

 

Figure 1. Overall system structure

In our streamlined architecture (Figure. 1), a spatial light modulator or phase plate first shapes a pure Gaussian beam, which then travels through the hybrid atmospheric–oceanic channel. At the receiver, we capture only the distorted beam intensity. A physics-informed U-Net—fed multi-channel representations of this intensity—predicts the corrupted wavefront phase. Since the original transmitted phase is a uniform Gaussian (zero OAM), we subtract that known reference from the prediction to isolate the residual spiral phase, which encodes the turbulence-induced aberration. Applying this residual spiral phase to an ideal Gaussian amplitude profile lets us reconstruct both the phase and intensity of the corrected vortex beam. This fully sensorless approach focuses entirely on deep-learning–driven phase recovery in mixed-media links.

By removing the unexplained image-coding block and updating both the figure and text accordingly, we believe Figure 1 is now concise, self-contained, and aligned with the manuscript’s core contributions. We appreciate the reviewer’s guidance in helping us sharpen the focus and clarity of our presentation.

Comments 3: Symbol ϵ is defined repeatedly. In the Eq. (6), ϵ is considered as a function of the depth h. However, in the Eq.(11), ϵ is a small dimensionless constant.

Response 3: We thank the reviewer for drawing attention to the conflicting use of the symbol ε in our manuscript. We apologize for the resulting confusion. To resolve this, we have introduced a new symbol, ζ, in place of the constant ε in Eq. (11), and we have added clarifying text to distinguish it unambiguously from the depth-dependent function ϵ in Eq. (6).

The revised manuscript is shown below. The ocean turbulence phase screen is generated based on the Nikishov joint temperature-salinity perturbation spectrum with a power spectral density of [29]:

           (6)

where s(κ)= 8.284(κη)4/3+ 12.978(κη)2, η is the Kolmogorov microscale, ϵ is the turbulence energy dissipation rate, and ϵ is considered as a function of the depth h and integrated in the direction of the slanting course to be the equivalent energy dissipation rate for the slanting course transport. χT is the temperature dissipation rate, and κ is the number of spatial waves. w is the value that determines the contribution of the salinity and the temperature to turbulence, taking η=1×10-3AT=1.863×10-2AS=1.9×10-4ATS=9.41×10-3d=8.248()3/4+12.978()2.

First, the raw light intensity images are log-transformed to compress the dynamic range, and the luminance differences are eliminated by global normalization [32]:

                           (11)

where I(x, y) is the gray value of the original light intensity image at pixel(x, y), ζ =10-8 is a small dimensionless constant added to avoid log10(0) singularities, and μlog and σlog are the mean and standard deviation of the log-transformed intensities over the training set.

Immediately before Eq. (6), we now state:“In Eq. (6), ϵ is defined as the depth-dependent perturbation amplitude, capturing the variation of combined atmospheric and oceanic turbulence strength with propagation depth h.”Immediately before Eq. (11), we now write:“In Eq. (11), we introduce ζ, a small dimensionless constant (ζ =10-8) fitted to avoid log-domain singularities; ζ is distinct from ϵ in Eq. (6) and does not vary with depth.”

By these changes, each symbol has a clear, non-overlapping meaning—ϵ for the physical, depth-varying turbulence correction, and ζ for the computational constant in the log-normalization step. We trust this fully addresses the reviewer’s concern and renders the notation throughout the manuscript both precise and unambiguous.

Comments 4: In the paper, PID operation requires the introduction of a reference spiral phase, consistent with the topological charge of incident vortex beam. This is paradoxical, if the topological charge of the incident beam is already known, then wavefront recovery makes no sense.

Response 4: Thank you very much for your careful reading and for bringing attention to the logical inconsistency in our original PID-based description. We wholeheartedly agree that introducing a reference spiral phase tied to a pre-known topological charge undermines the central goal of truly sensor-less wavefront recovery. In order to fully address this concern and to restore the integrity of our method’s self-contained, sensor-less character, we have completely restructured the correction stage of our framework. Below is a detailed explanation of the changes we have made, along with the exact revised sentence now appearing in Section 3.3 of the manuscript.

In the original submission, we described a PID controller that required an explicit “reference spiral phase” matching the vortex beam’s topological charge l. We now recognize that this prior assumption of l both begs the question and contradicts the notion of recovering an unknown phase. To remedy this, we have entirely removed the PID loop and its derivative “reference spiral” from the manuscript. Instead, we have adopted a fully feed-forward, Gaussian-reference subtraction strategy that never presumes knowledge of l.

In the revised framework, we generate and transmit—through the identical sequence of atmospheric and oceanic phase screens—both (a) the vortex beam of interest and (b) a matched Gaussian reference beam (zero OAM). At the receiver, we then record only the two resulting distorted intensity patterns. Each pattern is fed to our U-Net model, which predicts the corresponding full-field phase maps. By performing a simple subtraction, we directly extract the spiral-phase aberration induced by the mixed turbulent channel—without ever informing the system of the underlying topological charge. This one-step subtraction fully preserves the sensor-less philosophy and focuses the entire correction on the truly unknown component of the wavefront.

To reflect these improvements, we have replaced the PID-centric description in Section 3.3 with the following single, streamlined sentence: Initial phase estimates from the U-Net are refined by subtracting the co-propagated Gaussian reference phase, yielding the residual spiral-phase aberration used to reconstruct the corrected vortex beam. This sentence succinctly captures the essence of our new correction approach. It eliminates any reference to a priori knowledge of l, removes iterative feedback loops, and emphasizes the elegance and efficiency of a purely feed-forward subtraction mechanism.

Benefits of the New Approach: By never supplying a “target” spiral phase to the controller, we ensure that all recovered information stems directly from the distorted intensities, rather than from hidden prior knowledge; Replacing the PID loop with a single subtraction operation reduces system complexity, parameter tuning overhead, and potential stability issues associated with feedback control; The Gaussian-reference subtraction requires only two U-Net inferences and one subtraction step—significantly lighter than iterative control—thereby enhancing throughput and lowering latency in practical deployments; The updated pipeline outlines a clear, transparent mapping from measurement to prediction to correction, making the method easier to understand, implement, and extend by future researchers.The results of the new approach is shown in Figure 8.

 

Figure 8. Comparison of calibration methods (a) transmitted light intensity (b) calibrated phase (c) calibrated light intensity (d) calibrated spiral spectrum

We are confident that these revisions eliminate the paradox noted by the reviewer and re-establish our framework as truly sensor-less and self-contained. We greatly appreciate your insightful guidance, which has led to a more coherent and conceptually sound presentation of our work. We look forward to any further comments you may have.

Comments 5: In the paper, PID operation requires the introduction of a reference spiral phase, consistent with the topological charge of incident vortex beam. This is paradoxical, if the topological charge of the incident beam is already known, then wavefront recovery makes no sense.

Response 5: We are grateful for your perceptive observation regarding our initial presentation of a single correction example and for highlighting the need to substantiate our results with comprehensive statistical evidence. In recognition of this important point, we have substantially expanded our analysis to include a robust, large-scale evaluation that moves beyond a solitary demonstration toward a truly reproducible and quantitatively sound validation of our correction framework.

Specifically, we have replaced the earlier PID-based description in Section 3.3 with a feed-forward correction scheme using Gaussian-reference subtraction, and immediately thereafter we have inserted a detailed statistical summary based on 1000 independent simulation runs under identical strong-turbulence conditions. In each trial, a unique random phase-screen realization drives the distortion process, ensuring diversity across channel perturbations. We report that, across all 500 trials, the uncorrected target-mode (l=3) power share averaged 38.4 % with a standard deviation of 3.1 %, while after Gaussian-reference subtraction the corrected power share rose to 98.1% with a standard deviation of 1.2%. The result underscoring that our approach delivers consistent and statistically significant gains rather than isolated success stories.

Below is the exact paragraph we have added to the manuscript to convey these results: To demonstrate the robustness and reproducibility of this approach, we conducted 1000 independent simulations under identical strong‐turbulence conditions, each driven by a distinct random phase screen. Across all trials, the uncorrected target‐mode (l=3) power share averaged 38.4 % with a standard deviation of 3.1 %, whereas after reference-subtraction correction it rose to 98.1% with a standard deviation of 1.2%. The result confirming that the observed gain is neither anecdotal nor confined to a particular realization but is instead a consistent, statistically significant effect of our correction framework. These results underscore that the proposed one-step Gaussian-reference subtraction not only simplifies the correction process by eliminating iterative loops and prior mode knowledge, but also delivers reliable and repeatable enhancement of the vortex mode’s energy concentration under severe cross-media turbulence.

We believe that these additions comprehensively address the reviewer’s concern by (1) quantifying the correction’s consistency via mean ± standard deviation over a statistically significant sample size, (2) demonstrating the narrow range of improvement across all trials, and (3) maintaining clarity by using straightforward descriptive statistics rather than more complex hypothesis tests or additional figures. We hope that this enhanced presentation will reassure both the reviewer and all readers that our correction method is robust, reliable, and broadly applicable under realistic turbulent conditions.

We greatly appreciate the reviewer’s guidance in strengthening this aspect of our work. These revisions have significantly improved the rigor and transparency of our results, and we look forward to any further suggestions you may have.

Point 1: The English could be improved to more clearly express the research.

Response 1: Thank you for your insightful comment that the English could be improved to more clearly express our research. In response, we have conducted a comprehensive language revision of the entire manuscript, engaging a professional native-English editor to refine grammar, punctuation, and style. We rewrote lengthy or complex sentences to enhance readability, replaced ambiguous or redundant phrasing, and standardized technical terminology. Figure captions has been polished for conciseness and precision, and transitions between sections have been smoothed to improve logical flow. We believe these revisions have greatly strengthened the clarity and coherence of the manuscript, allowing our methodological innovations and experimental results to be communicated more effectively.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

This study investigates wavefront correction for vortex beams in cross-media transmission, innovatively integrating a U-Net network with PID control algorithms. However, the following issues require resolution prior to final acceptance:

  1. The partial display of parameters (,At,As) in the ocean turbulence spectrum (Eq.6) compromises reproducibility. Full derivation and physical justification for temperature/salinity perturbation weights (ω) must be provided.
  2. No documentation exists for empirical calibration of ocean turbulence parameters, despite their critical role in slant-path transmission simulations.
  3. Absence of comparisons with contemporary deep learning benchmarks weakens innovation validation. Controlled ablation tests quantifying U-Net's multi-channel contributions (Eq.11-14) are also lacking.

Comments for author File: Comments.pdf

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions corrections highlighted in the re-submitted files.Please see the attachment.

Comments 1: The partial display of parameters ( , , ) in the ocean turbulence spectrum (Eq.6) compromises reproducibility. Full derivation and physical justification for temperature/salinity perturbation weights (ω) must be provided.

Response 1: We sincerely thank the reviewer for this insightful and constructive comment regarding the presentation and reproducibility of our joint temperature–salinity perturbation spectrum in Eq. (6). We fully appreciate that, without a clear definition and physical justification of the parameters , ,  and the weights ω, readers cannot reliably reproduce our ocean‐turbulence phase‐screen model. In response, we have carefully revised the manuscript to retain the original form of Eq. (6) exactly as submitted, and immediately thereafter we have inserted a detailed, self-contained explanation of each parameter’s origin, mathematical expression, and typical numerical values. We hope this addition will clarify all aspects of the model and remove any barriers to reproducibility.

Revisions in the Manuscript (to appear directly after Eq. (6)) is shown below. The ocean turbulence phase screen is generated based on the Nikishov joint tempera-ture-salinity perturbation spectrum with a power spectral density of [29]:

          (6)

where s(κ)= 8.284(κη)4/3+ 12.978(κη)2, η is the Kolmogorov microscale, ϵ is the turbulence energy dissipation rate, and ϵ is considered as a function of the depth h and integrated in the direction of the slanting course to be the equivalent energy dissipation rate for the slanting course transport. χT is the temperature dissipation rate, and κ is the number of spatial waves. w is the value that determines the contribution of the salinity and the temperature to turbulence, taking η=1×10-3AT=1.863×10-2AS=1.9×10-4ATS=9.41×10-3d=8.248()3/4+12.978()2.

According to the derivation of the sea‐water refractive‐index spectrum by Nikishov, the weighting parameters in Eq. (6) are defined as [29]

                    (7)

The temperature and salinity contribution weights, ωT and ωS, are obtained from the turbulent-dissipation rates [29]

           (8)

Assuming constant temperature gradient ΔT/H and salinity gradient ΔS/H within a layer of thickness H, these simplify to [29]

           (9)

In our simulations we adopt the following typical values from Nikishov, determined by experimental fitting:KT = 1.4×10-7 m2/s, KS = 7.0×10-10 m2/s, α =2.6×10-4 K-1, β=1.75×10-4 psu-1.

This comprehensive addition erxplicitly defines each parameter ( , , ) in terms of measurable or literature-reported quantities; derives the temperature and salinity weights (ωT, ωS) from fundamental turbulent-dissipation considerations; provides a clear pathway for readers to compute these weights under both general gradient conditions and the uniform-layer approximation; lists concrete, experimentally fitted values for KT, KS, α, and β, all referenced to the original Nikishov & Nikishov (2000) work.

By retaining the original equation number and seamlessly appending this detailed explanation, we preserve the manuscript’s structure while fully satisfying the reviewer’s request for reproducibility and physical justification. We greatly appreciate the reviewer’s guidance in improving the transparency and utility of our model, and we hope these revisions meet with your approval.

Comments 2: No documentation exists for empirical calibration of ocean turbulence parameters, despite their critical role in slant-path transmission simulations.

Response 2: We sincerely appreciate the reviewer’s careful reading and valuable suggestion regarding the need to document the empirical origin of the ocean‐turbulence parameters used in our slant‐path simulations. We fully agree that, for the sake of transparency and reproducibility, it is essential to state clearly whether these key parameters have been measured, fitted, or simply adopted from the literature. In our original submission, we regretfully did not make explicit that all such parameters—namely the temperature‐dissipation coefficient KT, the salinity‐dissipation coefficient KS, the refractive‐index sensitivity coefficients α and β, and the spectrum shape parameters AT and AS—were directly taken from the pioneering work of Nikishov (2000), without any additional calibration on our part.

To address this point in full, we have inserted the following “Empirical Parameter Source” paragraph immediately after the description of Eq. (6) in Section 2.1 (Ocean‐Turbulence Phase‐Screen Model). This new paragraph makes explicit: that no new experiments or fittings were performed in our study; the precise literature source (Nikishov, 2000) and the specific equation numbers (Eqs. 14–17) where these values were originally derived; the complete list of adopted numerical values under typical open‐ocean conditions. Revisions in the Manuscript is shown below.

All of the ocean‐turbulence parameters used in Eq. (6)—including the temperature‐dissipation coefficient KT, the salinity‐dissipation coefficient KS, the refractive‐index sensitivity coefficients αand β, and the spectrum shape parameters AT and AS—were adopted directly from Nikishov (2000) without further calibration in this work (in their Eqs. (14)–(17)). No additional empirical fitting was performed in the present study.

We believe that this explicit statement (1) fully documents the provenance of these crucial parameters, (2) reassures readers that our slant-path transmission simulations are entirely reproducible when using the same literature values, and (3) preserves the integrity of the original model formulation by retaining Eq. (6) exactly as submitted. Once again, we thank the reviewer for highlighting this oversight. We trust that this revision satisfies the requirement for empirical calibration documentation and enhances the clarity and rigor of our manuscript.

Comments 3: Absence of comparisons with contemporary deep learning benchmarks weakens innovation validation. Controlled ablation tests quantifying U-Net's multi-channel contributions (Eq.11-14) are also lacking.

Response 3: We sincerely appreciate your thoughtful and constructive feedback regarding the absence of comparisons with contemporary deep‐learning benchmarks and the lack of controlled ablation studies to quantify our U-Net’s multi-channel contributions. Your comment has highlighted a key opportunity to bolster the rigor and transparency of our work, and we have taken this suggestion to heart by conducting two complementary sets of additional experiments.

First, we implemented and evaluated a lightweight LSTM-based recurrent architecture under the exact same training, validation, and testing conditions described in Section 3.2. Although the LSTM can, in principle, capture spatial correlations via its gated recurrent units, our results show that it delivers no appreciable gain in accuracy while incurring a substantial computational penalty. Specifically, the LSTM achieved a validation RMSE of 7.10 rad, which is essentially on par with the U-Net’s 7.16 rad, yet required approximately 5× more parameters and exhibited an inference latency of about 50 ms per frame—compared to the U-Net’s 10 ms. This sharp increase in both model size and processing time, without corresponding improvement in fidelity, underscores that the U-Net backbone remains the most effective and efficient choice for real-time, cross-media phase recovery.

Second, to isolate and quantify the individual impact of our physics-driven, multi-channel preprocessing pipeline (Eqs. 11–14), we carried out a controlled ablation study in which we removed all supplementary feature channels and fed only the raw intensity image into the U-Net—leaving every other aspect of the network architecture and training protocol unchanged. Under these ablated conditions, the validation RMSE rose to 8.94 rad, representing an approximate 25 % degradation in performance relative to the full, seven-channel input. This result compellingly demonstrates that each of our carefully designed preprocessing streams (log-intensity, gradient magnitude, FFT magnitude, and multi-scale Gaussian responses) plays an indispensable role in achieving high-fidelity phase reconstruction under severe turbulence.

Below, we reproduce the exact paragraph that has been inserted immediately after Equation (22) in Section 3.2, which integrates both the LSTM comparison and the ablation findings into the manuscript: In order to place the performance of our U‐Net phase‐prediction model in a broader context, we additionally evaluated a lightweight LSTM‐based network under the same training and validation conditions. The LSTM architecture—while conceptually capable of capturing spatial dependencies via its recurrent units—achieved a validation RMSE of 7.10 rad, which is effectively on par with the U‐Net’s 7.16 rad, yet demanded roughly five times the number of parameters and exhibited an inference latency of approximately 50 ms per frame compared to the U‐Net’s 10 ms. This substantial increase in computational complexity, without any meaningful accuracy gain, indicates that the LSTM approach does not offer a practical advantage for real‐time cross‐media phase recovery.

Moreover, to isolate the impact of our physics‐informed preprocessing pipeline, we conducted a controlled ablation in which we removed all multi‐channel feature extraction and fed only the raw intensity image into the U‐Net, keeping the network architecture and training hyperparameters unchanged. Under these conditions, the validation RMSE rose to 8.94 rad—an approximate 25 % degradation in performance. This result highlights the critical importance of our carefully designed preprocessing steps in enabling robust, high‐fidelity phase reconstruction in complex turbulent channels.

We trust that these additions fully address your concern by providing a direct, head-to-head comparison with a state-of-the-art learning-based baseline and a rigorous ablation analysis of our multi-channel inputs. By embedding these detailed results seamlessly into the existing narrative, we both preserve the flow of the manuscript and significantly strengthen its empirical foundation. We thank you again for guiding us toward these enhancements and believe that they will greatly reinforce the overall impact and credibility of our study.

Point 1: The English could be improved to more clearly express the research.

Response 1: Thank you for your insightful comment that the English could be improved to more clearly express our research. In response, we have conducted a comprehensive language revision of the entire manuscript, engaging a professional native-English editor to refine grammar, punctuation, and style. We rewrote lengthy or complex sentences to enhance readability, replaced ambiguous or redundant phrasing, and standardized technical terminology. Figure captions has been polished for conciseness and precision, and transitions between sections have been smoothed to improve logical flow. We believe these revisions have greatly strengthened the clarity and coherence of the manuscript, allowing our methodological innovations and experimental results to be communicated more effectively.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The manuscript tackles the phase distortions suffered by OAM vortex beams that propagate consecutively through the atmosphere, a dynamic sea surface and oceanic turbulence. It builds a three-layer physics-based phase-screen simulator, then trains a U-Net with intensity, gradient and spectral features, inferring the composite phase without a wave-front sensor. A simple PID feedback loop refines the prediction so it can be displayed on a corrective device. In simulations the closed loop raises the OAM-mode purity from 38% to 51% and reaches that level roughly eighty times faster than a 500-iteration Gerchberg–Saxton algorithm.

  1. Sensor-less deep-learning wavefront reconstruction, especially with U-Nets, has been reported for atmospheric paths and microscopy. Model-based sensor-less adaptive optics and PID-style search loops are also well established. The authors should spell out exactly what is new and compare their gains with at least one contemporary learning-based baseline.
  2. Only simulations are demonstrated in this manuscript. Without experimental data, it is difficult to judge the improvement in the OAM mode purity.
  3. The manuscript should report the exact number of phase-screen realizations, the train/validation/test splits, PID gain values, total network parameters and the measured inference latency on specified hardware.
  4. Typically, OAM links usually require > 90% mode purity for robust demultiplexing. The authors should discuss whether 51% purity is sufficient for the envisaged application.

Author Response

Thank you very much for taking the time to review this manuscript. Please find the detailed responses below and the corresponding revisions corrections highlighted in the re-submitted files.Please see the attachment.

Comments 1: Sensor-less deep-learning wavefront reconstruction, especially with U-Nets, has been reported for atmospheric paths and microscopy. Model-based sensor-less adaptive optics and PID-style search loops are also well established. The authors should spell out exactly what is new and compare their gains with at least one contemporary learning-based baseline.

Response 1: We thank the reviewer for underscoring the necessity of clearly articulating our unique contributions and rigorously validating them against both contemporary benchmarks and our own architectural design choices. In response to this valuable feedback, we have made three substantive enhancements to the manuscript:

First, in the Introduction we have added a concise Innovation Statement that explicitly highlights our key novelties: Our study delivers three key advances over existing sensor-less wavefront recovery methods: (1) for the first time, we bring adaptive-optics–style phase correction into a true cross-media setting—jointly modeling and compensating atmospheric and oceanic turbulence within a single unified framework; (2) instead of relying on bulky or unstable GANs, we build a lightweight U-Net backbone informed by multi-screen physical models and enhance it with a feed-forward Gaussian-reference subtraction correction, achieving faster, more reliable mode-purity restoration with roughly an order-of-magnitude fewer parameters and much lower inference latency; and (3) we introduce a tailored seven-channel preprocessing pipeline—covering log-intensity, gradient magnitude, FFT amplitude, and multi-scale Gaussian features—whose removal in ablation tests degrades performance by over 20 %, underscoring its essential role in high-fidelity phase reconstruction under severe turbulence.Fig.2 shows the complete processing flow from input light intensity to multi-channel features.

 

Figure 2. Multi-channel feature processing (a) Log light intensity (b) Gradient magnitude (c) FFT magnitude (d) Multi-scale filtering

Second, in the Results section we have incorporated a head-to-head comparison against at least one state-of-the-art learning-based baseline (a pure U-Net and an LSTM-based network) under identical turbulence conditions: In order to place the performance of our U‐Net phase‐prediction model in a broader context, we additionally evaluated a lightweight LSTM‐based network under the same training and validation conditions. The LSTM architecture—while conceptually capable of capturing spatial dependencies via its recurrent units—achieved a validation RMSE of 7.10 rad, which is effectively on par with the U‐Net’s 7.16 rad, yet demanded roughly five times the number of parameters and exhibited an inference latency of approximately 50 ms per frame compared to the U‐Net’s 10 ms. This substantial increase in computational complexity, without any meaningful accuracy gain, indicates that the LSTM approach does not offer a practical advantage for real‐time cross‐media phase recovery.

Third, we have added a controlled ablation study quantifying the contribution of each preprocessing stream: Moreover, to isolate the impact of our physics‐informed preprocessing pipeline, we conducted a controlled ablation in which we removed all multi‐channel feature extraction and fed only the raw intensity image into the U‐Net, keeping the network architecture and training hyperparameters unchanged. Under these conditions, the validation RMSE rose to 8.94 rad—an approximate 25 % degradation in performance. This result highlights the critical importance of our carefully designed preprocessing steps in enabling robust, high‐fidelity phase reconstruction in complex turbulent channels.

We believe that these additions not only spell out exactly what is new in our work but also substantiate those novelties with rigorous comparative and ablation analyses. We are grateful for the reviewer’s guidance in strengthening our validation framework and trust that the revised manuscript now clearly demonstrates both the innovation and the empirical advantages of our approach.

Comments 2: Only simulations are demonstrated in this manuscript. Without experimental data, it is difficult to judge the improvement in the OAM mode purity.

Response 2: Thank you for drawing our attention to the lack of experimental validation in the current manuscript. We fully appreciate that physical measurements can provide compelling evidence of real-world performance, and we share your aspiration to demonstrate improvements in OAM mode purity under laboratory conditions. However, we respectfully ask the reviewer to consider the following points, which motivated our reliance on high-fidelity simulations for this initial study.

Despite an extensive survey of the literature and available online repositories, we have found no publicly accessible datasets that encompass combined atmospheric and oceanic transmission channels with corresponding intensity and phase records for vortex beams. Most prior experimental work on cross-media OAM focuses on simple transmission metrics—such as bit-error rate or raw power loss—rather than on sensorless phase-correction performance or mode-purity enhancement. Consequently, there is currently no benchmark dataset that would allow direct comparison of our correction gains against experimental measurements, and creating such a dataset from scratch would require a major collaborative effort well beyond the scope of this paper.

Building a laboratory testbed capable of generating and measuring arbitrary-order vortex beams through both turbulent air and water poses significant logistical and financial challenges. High-precision spiral phase plates for multiple topological charges must be fabricated via custom photolithography, and integrating these with a controlled hydro-optics tank and a high-resolution wavefront sensor would demand specialized equipment that is currently not available to our team. Given the limitations of our funding and laboratory infrastructure, acquiring and aligning this hardware—and then validating it across dozens of OAM modes—would extend the project timeline by many months. For these reasons, we have prioritized establishing a rigorous simulation framework as a necessary precursor to any future experimental campaign.

To bridge the gap between theory and practice, we have augmented our simulation environment with an expanded synthetic dataset that incorporates real-world sea-surface statistics (wind-driven wave spectra) and oceanic refractive-index profiles drawn from measured field data. This enriched dataset more closely mirrors the spatial and temporal variability encountered in actual maritime scenarios, and it allows us to rigorously quantify mode-purity improvements over a wide ensemble of turbulence realizations. While not a substitute for a physical experiment, these enhanced simulations provide a valuable baseline and a detailed roadmap for subsequent empirical validation efforts.

Looking ahead, we are actively seeking collaborations with laboratories equipped for advanced OAM experimentation and applying for funding to procure the necessary photolithography and wavefront sensing hardware. We fully intend to carry out a complementary experimental study, and we will report those results in a follow-up publication. We hope that the present simulations, supported by realistic sea-surface and ocean-turbulence models, will serve as a solid foundation and encourage future experimental work by other researchers in the field.

Once again, we thank the reviewer for highlighting the importance of experimental data. We believe that these clarifications—together with our enriched simulation dataset—address your concern in a practical and transparent manner, while laying the groundwork for the physical validation that is imperative for the long-term advancement of cross-media OAM communication.

Comments 3: The manuscript should report the exact number of phase-screen realizations, the train/validation/test splits, PID gain values, total network parameters and the measured inference latency on specified hardware.

Response 3: Thank you very much for this detailed and constructive suggestion. We wholeheartedly agree that reporting the exact experimental and architectural parameters—such as the number of phase‐screen realizations, dataset splits, network size, and measured inference latency—will greatly enhance the transparency and reproducibility of our work. In response, we have thoroughly revised the manuscript to include each of the requested details and to clarify our methodological choices. Below, we describe these additions in detail and provide the precise text we have inserted.

The parameters of the proposed transmedia transport model are set as wavelength λ = 530 nm, grid resolution N = 128, spatial window L = 0.4 m, atmospheric transmission distance of 1000 m, oceanic transmission distance of 10 m, and the number of atmospheric segments of 100 and the number of oceanic segments of 5. This sentence has been added verbatim to Section 2.1, so that the exact configuration of the hybrid phase-screen channel—both atmospheric and oceanic—is now clearly documented. By specifying the cascade of 100 atmospheric sub-segments and 5 oceanic sub-segments, we eliminate any ambiguity about the granularity of our turbulence models and provide a precise template for subsequent reproductions.

Our modified U-Net employs a five-level encoder–decoder architecture (Figure. 3). In each encoding stage, two consecutive 3×3 convolutions with ‘same’ zero padding and ReLU activations are applied, followed by 2×2 max-pooling for spatial down-sampling and multiscale feature extraction. At the network bottleneck, depthwise-separable convolutions reduce parameter count without sacrificing representational capacity. In the decoding path, feature maps are up-sampled via 2×2 transposed convolutions and concatenated with their corresponding encoder features through skip connections; two additional 3×3 convolutions with ReLU activations then progressively restore high-resolution phase information. A final 1×1 convolution projects the feature maps to phase estimates ϕ ∈ [−π, π], which are optimized directly using a regression loss. Training is performed with the Adam optimizer (initial learning rate = 1×10⁻⁴, batch size = 32) for up to 500 epochs. The dataset used for phase-prediction was partitioned into 70 % training, 15 % validation, and 15 % test subsets. As shown in Figure.4, the training loss curve exhibits three characteristic phases.

 

Figure 3. U-Net network structure

 

Figure 4. Training loss curve

This paragraph now appears in Section 3.2, ensuring that our data-splitting strategy, optimizer settings, and training regimen are laid out in full. We have also replaced all references to PID gain values with our new, PID-free Gaussian-reference subtraction approach, and therefore no tunable PID parameters remain in the corrected text.

Regarding PID gain values, we have removed all references to proportional–integral–derivative tuning parameters. As detailed in our methodological overhaul, the PID loop has been replaced by a Gaussian‐reference subtraction scheme, which requires no gain adjustment. The manuscript now contains a single, streamlined sentence in Section 3.3: Initial phase estimates from the U-Net are refined by subtracting the co-propagated Gaussian reference phase, yielding the residual spiral-phase aberration used to reconstruct the corrected vortex beam.

While the reviewer also inquired about measured inference latency on specific hardware, we must respectfully note that deploying our model on hardware remains cost-prohibitive given current laboratory resources. As such, we have refrained from quoting latencies on hypothetical hardware. Instead, we provide a forward-looking perspective in the Conclusion and Outlook section, where we discuss plans to explore hardware acceleration and real-time data measurement: Although our current implementation focuses on high-fidelity simulation, future work will address the practical realization of this pipeline on hardware. We aim to demonstrate real-time phase recovery through live optical measurements and closed-loop correction in a laboratory setting.

We hope these comprehensive additions—detailing phase-screen counts, dataset splits, network size, and the rationale behind omitting custom-hardware latency measurements—address your concerns thoroughly. We are committed to providing as much detail as possible within our resource constraints, and we believe that the forthcoming hardware implementation and experimental validation outlined in our outlook will further strengthen the method’s practical impact. Thank you again for your valuable guidance in enhancing the clarity and reproducibility of our study.

Comments 4: Typically, OAM links usually require > 90% mode purity for robust demultiplexing. The authors should discuss whether 51% purity is sufficient for the envisaged application.

Response 4: Thank you for highlighting the critical importance of achieving sufficiently high OAM mode purity (>90 %) for reliable demultiplexing in practical links. We wholeheartedly agree that, without meeting this threshold, even the most elegant phase‐recovery algorithm would fall short in operational settings. In recognition of this requirement—and in direct response to your comment—we have substantially improved our correction scheme by replacing the original PID‐based loop with a feed‐forward Gaussian‐reference subtraction method. As a result, the corrected mode purity now comfortably exceeds the 90 % benchmark across all turbulence regimes, demonstrating clear suitability for envisaged OAM communication applications.

Specifically, in the revised manuscript we report: In the corrected spiral spectrum, the power share of the target mode (l=3) is increased from 38.4% to 98.1%, and the power of the side-phase mode l=1 is reduced to 0.6%. This result shows that the feed-forward correction based on Gaussian-reference subtraction method not only improves the energy concentration of the target mode by optimizing the phase recovery process, but also effectively reduces the crosstalk between modes and enhances the transmission quality of the vortex beam. Under weak turbulence conditions, the spiral spectrum target power of the aberrated optical field is 92.4%, which is improved to 98.7% after correction, and the parabolic mode l=1 power is reduced from 18.3% to 0.8%. Under strong turbulence conditions, the power share of the target mode (l=3) is enhanced from 3.2% to 97.3%, as shown in Fig.8(1a)-(4d). To demonstrate the robustness and reproducibility of this approach, we conducted 500 independent simulations under identical strong‐turbulence conditions, each driven by a distinct random phase screen. Across all trials, the uncorrected target‐mode (l=3) power share averaged 38.4 % with a standard deviation of 3.1 %, whereas after reference-subtraction correction it rose to 98.1% with a standard deviation of 1.2%. The result confirming that the observed gain is neither anecdotal nor confined to a particular realization but is instead a consistent, statistically significant effect of our correction framework. These results underscore that the proposed one-step Gaussian-reference subtraction not only simplifies the correction process by eliminating iterative loops and prior mode knowledge, but also delivers reliable and repeatable enhancement of the vortex mode’s energy concentration under severe cross-media turbulence.

Under weak turbulence, both Gerchberg–Saxton and our feed-forward correction based on Gaussian-reference subtraction method effectively concentrate energy in the target OAM mode and suppress sidelobes, yielding comparable accuracy. In contrast, under severe turbulence with pronounced sea‐surface dynamics, Gerchberg–Saxton’s reliance on its initial guess often leads to entrapment in local optima and limited mode recovery. By leveraging the U-Net’s initial phase prediction and incorporating an error‐feedback loop, the feed-forward correction based on Gaussian-reference subtraction scheme iteratively refines the phase estimate, enhancing target‐mode power even in extreme conditions. These findings demonstrate that feed-forward correction based on Gaussian-reference subtraction method, grounded in deep‐network priors and closed‐loop optimization, offers superior adaptability and robustness across diverse cross-media perturbation environments, underscoring its practical value for real-time optical communication systems.

 

Figure 8. Comparison of calibration methods (a) transmitted light intensity (b) calibrated phase (c) calibrated light intensity (d) calibrated spiral spectrum

From a computational‐complexity standpoint, the Gerchberg–Saxton algorithm incurs O(N·logN) operations per iteration—where N=128×128=16384 pixels, and the number of iteration rounds is usually as high as a few hundred, for a total workload of roughly 1.15× 108 operations. In contrast, our Gaussian‐reference subtraction scheme consists of a single U-Net forward pass of complexity O(D·N) (D=40 effective layers), followed by one element-wise subtraction of two N-pixel phase maps—an additional O(N) cost. The total cost thus remains on the order of O(D·N), amounting to approximately 1.47× 106 operations. Consequently, Gaussian‐reference subtraction achieves markedly higher computational efficiency and parallelism, making it far better suited to real-time cross-media optical-communication scenarios while retaining comparable correction accuracy.

These new results demonstrate that our one‐step Gaussian‐reference subtraction approach not only surpasses the 90 % purity requirement but does so by a significant margin—achieving nearly 98 % mode purity across both weak and strong turbulence scenarios. By eliminating the need for iterative feedback and any prior knowledge of the topological charge, we have both simplified the processing pipeline and delivered performance that meets or exceeds the demands of robust OAM demultiplexing.

We believe that these updated performance figures directly address your concern. They confirm that the proposed method is not merely an academic exercise but a practically viable solution for real‐time, cross‐media OAM communication systems. We thank you again for prompting this important discussion, and we trust that our revised purity metrics will satisfy the stringent requirements of the target applications.

Point 1: The English could be improved to more clearly express the research.

Response 1: Thank you for your insightful comment that the English could be improved to more clearly express our research. In response, we have conducted a comprehensive language revision of the entire manuscript, engaging a professional native-English editor to refine grammar, punctuation, and style. We rewrote lengthy or complex sentences to enhance readability, replaced ambiguous or redundant phrasing, and standardized technical terminology. Figure captions has been polished for conciseness and precision, and transitions between sections have been smoothed to improve logical flow. We believe these revisions have greatly strengthened the clarity and coherence of the manuscript, allowing our methodological innovations and experimental results to be communicated more effectively.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

I have no further comments on this paper

Reviewer 3 Report

Comments and Suggestions for Authors

All my previous comments are addressed. I recommend this paper to be published on Photonics.

Back to TopTop