Review Reports
- Sarut Puangragsa,
- Tanawit Sahavisit and
- Popphon Laon
- et al.
Reviewer 1: Anonymous Reviewer 2: Qichang An Reviewer 3: Anonymous
Round 1
Reviewer 1 Report
Comments and Suggestions for AuthorsThis paper presents research on the control of tracking systems that emerge when communication antennas are converted into radio telescopes. Effective control of radio telescopes -- such as accurate correction of the pointing direction -- is essential due to various factors, including radio frequency interference (RFI). In this study, the authors aim to achieve such control using deep reinforcement learning. The research is regarded as original, promising, and of high quality.
The paper is logically structured and provides a clear and detailed description of its objectives, methodology, results, and discussion. It would be suitable for publication once the following two issues are addressed:
Lines 152 - 153: The numerical values are incorrect and should be corrected.
Lines 214 - 229: This section repeats the same content as in the introduction. Redundant statements of this kind are undesirable and should be deleted or cond
ensed.
Author Response
Comments 1: The paper is logically structured and provides a clear and detailed description of its objectives, methodology, results, and discussion. It would be suitable for publication once the following two issues are addressed:
Lines 152 - 153: The numerical values are incorrect and should be corrected.
Lines 214 - 229: This section repeats the same content as in the introduction. Redundant statements of this kind are undesirable and should be deleted or condensed.
Response 1: We appreciate the confirmation that the foundational aspects of the paper are sound. We have focused on the specific line-item corrections below, which we believe address the remaining potential for improvement
Line 152 – 153: We have corrected the formatting and ensured the consistency of the unit notation in the final draft.
Line 214 – 229: The identified section (Lines 214–229 in the original PDF), which covered the general background on RL inefficiency, the need for simulation, and the concept of sim-to-real transfer, was indeed repetitive of the introduction. We have deleted the unnecessary introductory text. Section 2.3 starts directly with methodological specifics (The simulated environment was developed in Python...). This significantly condenses the section and improves flow.
Reviewer 2 Report
Comments and Suggestions for AuthorsReviewer Comments
This manuscript presents a novel application of DRL (including recurrent architectures such as LSTM) to the control and scheduling of a ground-based single-dish radio telescope operating under dynamic RFI conditions. The authors construct a high-fidelity simulation environment and demonstrate a sim-to-real transfer on an operational telescope. Experimental results indicate that the proposed Custom-CNN and LSTM-based agent significantly outperform the non-recurrent baseline, frame-stacked agent, and proportional controller in both survey coverage (deg²/h) and robustness. The topic is innovative and practically relevant.
However, several key aspects require further clarification and strengthening in order to improve the clarity, rigor, and persuasiveness of the manuscript. My specific comments are as follows:
- The abstract lacks quantitative results, weakening the conclusion.
Issue: The abstract provides only general statements and does not report core performance metrics.
Recommendation: Please include key quantitative indicators to highlight the effectiveness of the proposed method. For example:
“...Experiments show that the recurrent agent achieves a survey coverage of 475 deg²/h, representing a 72.7% improvement over the non-recurrent baseline, with only 1% degradation in real-world deployment.”
- The introduction suffers from a diffuse focus and unclear logical flow.
Issue: The introduction interleaves basic telescope background with project-specific engineering details, making it difficult for the reader to quickly grasp the scientific problem and methodological contribution.
Recommendation: Consider restructuring the introduction following a clearer narrative:
RFI challenges → Limitations of existing approaches → DRL as a promising solution → Summary of contributions.
Engineering details about the telescope conversion can be moved to the “Materials and Methods” section.
- The dynamic modeling of RFI sources in the simulation is insufficiently described.
Issue: The paper does not clearly explain how RFI sources (e.g., aircraft, satellites) are generated, modeled, or updated over time within the simulator.
Recommendation: Please provide explicit descriptions of the RFI generation logic, motion/trajectory models (e.g., whether real ADS-B data are used), and update mechanisms across simulation timesteps. This information is essential for transparency and reproducibility.
- The domain randomization strategy for sim-to-real transfer is vague.
Issue: The manuscript mentions the use of “random torques” and other perturbations but does not specify the parameter ranges or types of randomization applied.
Recommendation: Please list all domain randomization parameters used during training, including: torque fluctuation ranges, motor response delays, wind disturbance models (magnitude/duration), sensor noise characteristics, etc.
This will allow readers to assess the adequacy of the sim-to-real robustness strategy.
- Essential training details and hyper-parameters are missing.
Issue: PPO hyper-parameters (learning rate, discount factor, clip ratio, batch size, etc.) and computational resource details are not provided.
Recommendation: Please include a complete table of training hyper-parameters in the “Model Training” section, and briefly describe the hardware setup (GPU type, number of environments, approximate training time).
- Ablation studies are needed to validate the contribution of each component.
Issue: Current results do not isolate the impact of key architectural and design choices (Custom-CNN, LSTM memory, reward shaping, pre-training, domain randomization).
Recommendation: Add a dedicated ablation study section that systematically removes or replaces individual components and reports the corresponding performance changes. This will substantially strengthen the empirical justification for the proposed architecture.
Summary
The manuscript presents promising and practically valuable work, but addressing the issues above will significantly improve its clarity, rigor, and replicability. I appreciate the authors’ efforts and look forward to seeing a revised version.
Author Response
Comments 1: The abstract lacks quantitative results, weakening the conclusion.
Response 1: We have revised the Abstract to include the core quantitative performance metrics and transfer stability data as recommended.
The revised Abstract now explicitly states: "...Experiments show that the recurrent agent achieves a mean survey coverage of 475 deg²/h, representing a 72.7% improvement over the non-recurrent baseline, and maintained exceptional stability with only 1.0% degradation in median coverage during real-world deployment."
Comments 2: The introduction suffers from a diffuse focus and unclear logical flow.
Response 2: Implemented. We have substantially restructured the Introduction to improve the narrative flow. The specific engineering details concerning the KMITL conversion, X-Y pedestal difficulties, and the specific needs for motor/encoder replacement have been consolidated and moved from the Introduction to a dedicated subsection in Materials and Methods.
Comments 3: The dynamic modeling of RFI sources in the simulation is insufficiently described.
Response 3: Implemented. We have added a new paragraph in Section 2.3.1 (Observation Space) to explicitly detail the RFI modeling strategies used in the simulator. We clarify that RFI from geostationary satellites is modeled as stationary sources based on site survey data, while RFI from aircraft is modeled using realistic flight information (speed/altitude/heading) from real-world ADS-B data captured during a 7-day site survey. This information was replayed during the simulation. Simulation was also augmented with random transit flights that are generated at runtime and updated at every simulation step.
Comments 4: The domain randomization strategy for sim-to-real transfer is vague.
Response 4: Implemented. We have included the description of domain randomization in Section 2.3.
Comments 5: Essential training details and hyper-parameters are missing.
Response 5: Implemented. We have added a comprehensive table of PPO hyper-parameters in Section 2.4.2 (Model Training), now designated as Table 6.
Comments 6: Ablation studies are needed to validate the contribution of each component.
Response 6: Implemented through Reframing and Clarification. We sincerely appreciate this rigorous requirement, which is essential for validating architectural contributions. We respectfully contend that the initial experimental design already provides two core, quantitative ablation studies isolating the main components. To meet your expectation for clarity and rigor, we have explicitly reframed the presentation of these results within the revised manuscript and included empirical justification for the pre-training step. While we share the reviewer's desire for exhaustive ablation of every key factor (e.g., reward shaping or specific domain randomization settings), the current combination of studies is sufficient to validate our core architectural choices within the realistic constraints of the project. We have also listed this as a future research direction.
Reviewer 3 Report
Comments and Suggestions for AuthorsSummary
This paper presents a technique for telescope control and scheduling using Deep Reinforcement Learning (DRL). In particular, the authors employ a Convolutional Neural Network (CNN) feature extractor and a Long Short-Term Memory (LSTM) network to account for time dependence and add some degree of foresight into the DRL agent. The aim is create a scheduling agent capable of adapting to the dynamic RFI environment at the KMTIL radio telescope. This is a good addition to the rapidly growing field of using machine learning techniques for dynamic scheduling of observing instruments. The description of the problem, the stated goals and methodology are solid and well presented; though with significant grammatical errors. In contrast, the 'Results' section includes basic mistakes and lacks clarity; the plots are not described well. However, with some modifications (see specific comments below), it can be published.
Comments on specific sections
Abstract
The summary of the main results should quantify the improvements stated; if this novel approach is better, how much better is it? If it's more robust, what's the metric and quantity?
Section 2
Line 116: The word 'telescope' is spelled with a hyphen, it should not be.
Line 154: The start time symbol 'tr' is referenced as normal text, not as a term/symbol from an equation. This is one of many such cases, including 'N_M, J_M, J_L, T_{PM}, T_L' in lines 156 - 158. The same applies for figures.
Equation 4: I couldn't find where the symbol E_\pi is defined.
Line 221 - 221: First provide the citation for the referenced work. Secondly, this seems like a very simple case (unlike the one you are dealing with), so please expand a bit on why you think it is worth mentioning.
Line 276: Astropy is a very large library, you need to state the module/function that you used. Line 287: Figure 3 does not show the map that the text describes.
Line 448: Add a citation to the work you are referencing.
Section 3
Line 487: This lines ends with a ':' which made me expect some bulleted/numbered descriptions, but none followed. Is this a typo?
Figure 13: There's a lot of information in this plot; it shows the main result of the study. It needs to be described more clearly both in the text body and the caption.
Line 531: It looks like you are confusing the IQR of the non-recurrent agent with that of the proportional controller.
Table 7: You use symbols Q1, Q2, Q3 without defining them in the text body.
Figure 11: What are the different line types; the noisy vs smooth line types? This should be clearly stated in the caption. This also applies to the rest of the plots in this paper.
Discussion
The fact that the recurrent agent converges after a 18 million time steps, which is 8 and 11 million more than the frame-stacking and non-recurrent agents is noteworthy. A discussion on this should be included, as well as potential avenues for improvement.
Comments on the Quality of English LanguageThe quality of the writing is OK, but there are too many grammatical errors for a potential peer-reviewed article. I suggest the use of a grammar checker before re-submission.
Author Response
Comment 1: The summary of the main results should quantify the improvements stated; if this novel approach is better, how much better is it? If it's more robust, what's the metric and quantity?
Response 1: Implemented. We have revised the Abstract to include the core quantitative performance metrics and transfer stability data as recommended.
The revised Abstract now explicitly states: "...Experiments show that the recurrent agent achieves a mean survey coverage of 475 deg²/h, representing a 72.7% improvement over the non-recurrent baseline, and maintained exceptional stability with only 1.0% degradation in median coverage during real-world deployment."
Comment 2: Line 116: The word 'telescope' is spelled with a hyphen, it should not be.
Response 2: Corrected
Comment 3: Line 154: The start time symbol 'tr' is referenced as normal text, not as a term/symbol from an equation. This is one of many such cases, including 'N_M, J_M, J_L, T_{PM}, T_L' in lines 156 - 158. The same applies for figures.
Response3: Corrected
Comment 4: Equation 4: I couldn't find where the symbol E_\pi is defined.
Response 4: Implemented. The definition of E_pi [G_t] is included before the equation.
Comment 5: Line 221 - 221: First provide the citation for the referenced work. Secondly, this seems like a very simple case (unlike the one you are dealing with), so please expand a bit on why you think it is worth mentioning.
Response 5: Implemented. This work was cited already in the introduction (Line 98). However, as the reviewer has pointed out, Lines 214–229 in the original version, which covered the general background on RL inefficiency, the need for simulation, and the concept of sim-to-real transfer, were indeed repetitive of the introduction. We have deleted the unnecessary introductory text. Section 2.3 starts directly with methodological specifics (The simulated environment was developed in Python...). This also condenses the section and improves flow.
We have retained the specific citation (18) in the PDF, which is the Rubik's Cube work. Despite the difference in scale, line 58-60 of the original text emphasizes that controlling the robotic arm used in the cited work is analogous to controlling the X-Y pedestal due to the shared underlying challenges of multi-axis, 2-DOF non-linear control, which DRL successfully tackles in both cases.
Comment 6: Line 276: Astropy is a very large library, you need to state the module/function that you used
Response 6: Implemented. We agree that specifying the modules is essential for reproducibility and has specified the modules use (specifically Astronomical Coordinate Systems modules) of Astropy.
Comment 7: Line 287: Figure 3 does not show the map that the text describes.
Response 7: Corrected. The original Figure 3 has been removed from the manuscript as it did not offer enough insight into the core ideas of the work. The text referencing the figure has been deleted, ensuring the description of the RFI environment remains intact without the incorrect reference. All subsequent figures and tables have been consistently renumbered.
Comment 8: Line 448: Add a citation to the work you are referencing.
Response 8: The work was cited as NatureCNN (citation 29), Minh et al, Human-level control through deep reinforcement learning.
Comment 9: Line 487: This lines ends with a ':' which made me expect some bulleted/numbered descriptions, but none followed. Is this a typo?
Response 9: Corrected
Comment 10: Figure 13: There's a lot of information in this plot; it shows the main result of the study. It needs to be described more clearly both in the text body and the caption.
Response 10: Implemented. We appreciate the reviewer's comment regarding the clarity of Figure 13. We agree that this figure is critical to the study's results, and we have implemented substantial revisions to ensure maximum interpretability in both the text and the figure caption.
Comment 11: Line 531: It looks like you are confusing the IQR of the non-recurrent agent with that of the proportional controller.
Response 11: Corrected.
Comment 12: Table 7: You use symbols Q1, Q2, Q3 without defining them in the text body.
Response 12: Implemented
Comment 13: Figure 11: What are the different line types; the noisy vs smooth line types? This should be clearly stated in the caption. This also applies to the rest of the plots in this paper.
Response 13: Implemented. The type of lines used are described in the figure captions.
Comment 14: The fact that the recurrent agent converges after a 18 million time steps, which is 8 and 11 million more than the frame-stacking and non-recurrent agents is noteworthy. A discussion on this should be included, as well as potential avenues for improvement.
Response 14: Implemented. We concur that the significantly longer training period for the Recurrent Agent is a critical factor. And not only is the number of steps before convergence noteworthy, due to architectural implications, the agent with LSTM ran 2-3 times slower than its counterparts. This would mean in a bigger model, it might not be realistic to train LSTM agent from scratch. A discussion on this issue is included in the revised version.
Round 2
Reviewer 2 Report
Comments and Suggestions for AuthorsThe authors have provided extremely comprehensive, meticulous, and high-quality responses to all the reviewers' comments. All six main points raised during the initial review have been addressed fully and excellently. The revised manuscript demonstrates marked and qualitative improvements in its scientific rigor, precision, completeness, and readability.
- Exceptional Supplementation of Quantitative Results: The authors have not only incorporated key quantitative performance metrics (such as the 475 deg²/h survey coverage rate and the 72.7% performance improvement) into the abstract but have also provided ample presentation throughout the Results section using rich graphs and statistical distribution data (e.g., box plots, interquartile range). This makes the paper's conclusions highly convincing.
- Meticulous Restructuring of Logic: The Introduction section has been reorganized as suggested, forming a clear narrative flow: "RFI Challenge → Limitations of Existing Approaches → DRL as a Solution → Summary of Contributions." This revision significantly enhances the paper's readability, allowing readers to quickly grasp the core research problem and innovative contributions.
- In-Depth Refinement of Methodological Details: The authors have provided extremely detailed descriptions of the dynamic modeling of RFI and the domain randomization strategy, which were points of concern in the review. This includes the use of ADS-B data, the parameter ranges for random torque, and transient disturbance models. These additions not only meet the review requirements but also greatly enhance the transparency and reproducibility of the research, reflecting the authors' rigorous scientific attitude.
- Powerful Validation of Model Contributions: The newly added systematic ablation experiments are a major highlight of this revision. Supported by detailed data, the authors clearly delineate and demonstrate the individual contributions of various core components, such as the custom CNN feature extractor, supervised pre-training, and the LSTM memory module. This provides solid and credible empirical support for the overall architecture proposed in the paper, significantly strengthening its argument.
- Formulas (1) and (2) should be divided into several formulas and introduced separately
Author Response
Comment: Formulas (1) and (2) should be divided into several formulas and introduced separately
Response: Implemented. Formula (1) is now divided into (1)-(4) and introduced separately. Similar treatment is applied to formula (2). All the subsequent formulas have been renumbered accordingly.