Next Article in Journal
Using Light Curve Derivatives to Estimate the Fill-Out Factor of Overcontact Binaries
Next Article in Special Issue
A Morphological Identification and Study of Radio Galaxies from LoTSS DR2 II. Strange and Odd Morphology Extragalactic Radio Sources ‘STROMERSs’
Previous Article in Journal / Special Issue
Radio Astronomy with NASA’s Deep Space Network
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement Learning-Driven Framework for High-Precision Target Tracking in Radio Astronomy

by
Tanawit Sahavisit
1,
Popphon Laon
1,
Supavee Pourbunthidkul
1,
Pattharin Wichittrakarn
2,
Pattarapong Phasukkit
1,* and
Nongluck Houngkamhang
3
1
School of Engineering, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
2
International Academy of Aviation Industry, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
3
Department of Nanoscience and Nanotechnology, School of Integrated Innovative Technology, King Mongkut’s Institute of Technology Ladkrabang, Bangkok 10520, Thailand
*
Author to whom correspondence should be addressed.
Galaxies 2025, 13(6), 124; https://doi.org/10.3390/galaxies13060124
Submission received: 13 September 2025 / Revised: 19 October 2025 / Accepted: 29 October 2025 / Published: 31 October 2025
(This article belongs to the Special Issue Recent Advances in Radio Astronomy)

Abstract

Radio astronomy requires precise target localization and tracking to ensure accurate observations. Conventional regulation methodologies, encompassing PID controllers, frequently encounter difficulties due to orientation inaccuracies precipitated by mechanical limitations, environmental fluctuations, and electromagnetic interferences. To tackle these obstacles, this investigation presents a reinforcement learning (RL)-oriented framework for high-accuracy monitoring in radio telescopes. The suggested system amalgamates a localization control module, a receiver, and an RL tracking agent that functions in scanning and tracking stages. The agent optimizes its policy by maximizing the signal-to-noise ratio (SNR), a critical factor in astronomical measurements. The framework employs a reconditioned 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL), originally constructed as a satellite earth station antenna for telecommunications and was subsequently refurbished and adapted for radio astronomy research. It incorporates dual-axis servo regulation and high-definition encoders. Real-time SNR data and streaming are supported by a HamGeek ZedBoard with an AD9361 software-defined radio (SDR). The RL agent leverages the Proximal Policy Optimization (PPO) algorithm with a self-attention actor–critic model, while hyperparameters are tuned via Optuna. Experimental results indicate strong performance, successfully maintaining stable tracking of randomly moving, non-patterned targets for over 4 continuous hours without any external tracking assistance, while achieving an SNR improvement of up to 23.5% compared with programmed TLE-based tracking during live satellite experiments with Thaicom-4. The simplicity of the framework, combined with its adaptability and ability to learn directly from environmental feedback, highlights its suitability for next-generation astronomical techniques in radio telescope surveys, radio line observations, and time-domain astronomy. These findings underscore RL’s potential to enhance telescope tracking accuracy and scalability while reducing control system complexity for dynamic astronomical applications.

Graphical Abstract

1. Introduction

Astronomy has historically epitomized one of humankind’s most persistent scientific boundaries, propelled by an inquisitiveness to investigate the architecture and progression of the universe. As history shows, the progression of our observational technologies has notably shifted our awareness, including the subtle details of celestial mechanics and the complex visualization of galaxies alongside widespread cosmic frameworks Astrophysics in this era spans the entire electromagnetic spectrum, with space and local observatories emphasizing extra frequency ranges. Among these, radio astronomy has been particularly transformative, enabling detection of pulsars, quasars, fast radio bursts, the cosmic microwave background, and neutral hydrogen emissions. These discoveries have fundamentally advanced cosmology, galactic dynamics, and plasma astrophysics.
Technological milestones have shaped observational progress. In optical astronomy, adaptive optics compensated for atmospheric turbulence, achieving near-diffraction-limited imaging from the ground [1]. For radio astronomy, Karl Jansky’s 1931 revelation of extraterrestrial radio radiation [2] marked the inception of specialized radio telescopes and unveiled a wholly novel observational aperture. Recent studies concerning the cosmic microwave background highlight how essential radio technologies are for the evolution of cosmology. Large-scale facilities such as the Very Large Array (VLA), Square Kilometre Array (SKA), and Five-hundred-meter Aperture Spherical Telescope (FAST) exemplify advances in aperture size, receiver sensitivity, and digital processing, enabling astronomers to study faint galaxies and probe astrophysical phenomena with remarkable precision [3,4].
Despite these capabilities, all modern observatories depend critically on accurate target localization and tracking. Even marginal pointing errors can significantly degrade faint signals that require long integration times [5]. Astronomers generally depend on exact coordinate repositories and orbital frameworks to direct telescopes towards thoroughly delineated sources [6,7,8,9]. For new or transient objects, however, no prior information exists. Consistent monitoring is consequently imperative, highlighting the necessity for robust real-time systems. Achieving such accuracy is challenging because errors arise from multiple sources: structural deformation, drive misalignment, encoder inaccuracies, thermal expansion, wind loading, atmospheric refraction, and electromagnetic interference (EMI) from terrestrial or solar origins [10,11,12,13,14]. While calibration routines and error correction mechanisms are routinely employed, they cannot fully eliminate uncertainty.
Classical control systems, particularly Proportional–Integral–Derivative (PID) controllers, are widely used because of simplicity and robustness. However, their restricted flexibility becomes a significant issue under nonlinear or swiftly varying circumstances characteristic of substantial radio telescopes, a basic system such as PID control will produce limited results [15]. More advanced methodologies encompassing adaptive regulation, model-driven prognostic techniques, or sliding-mode algorithms have been examined [16,17,18], but they demand accurate dynamic models. Developing such models is computationally expensive, system-specific, and difficult to generalize across instruments. Recent advances in tracking systems involves the combination of multiple sensor inputs and Kalman filter strategies, but they prioritize sensor error over basic control restrictions. Active disturbance rejection control demonstrates potential in controlled environments [19]; however, it remains unverified on extensive radio telescopes in authentic conditions.
The recent growth of artificial intelligence (AI) and machine learning (ML) offers an alternative for complex control. Unlike traditional methods, ML-based systems can learn from data, adapt to evolving conditions, and optimize performance without explicit analytical modeling. In domains such as robotics, mechatronics, and autonomous driving, these techniques have demonstrated strong adaptability and precision [20,21,22,23]. Reinforcement learning (RL), in particular, is naturally suited to telescope tracking. RL agents interact with their environment, receiving feedback in the form of rewards or penalties, and iteratively refine their policy to maximize long-term reward [24]. In telescope operations, the optimization target can be directly tied to measurable indicators such as signal-to-noise ratio (SNR).
State-of-the-art RL algorithms further strengthen this paradigm. Proximal Policy Optimization (PPO) [25] with Stable-Baselines3 the reinforcement learning framework of Open AI [26] improves stability and sample efficiency compared to earlier methods. Deep learning architectures augmented with self-attention mechanisms [27] enable agents to identify pertinent characteristics in intricate high-dimensional conditions. Additionally, hyperparameter optimization platforms like Optuna [28] aid in the automated refinement for steady convergence.
This examination elucidates a reinforcement learning-centric framework for heightened-precision tracking in radio observatories. The framework integrates three subsystems: a positioning control module for telescope actuation, a receiver system for SNR measurement, and an RL-based tracking agent for adaptive decision-making. The agent performs in two progressive stages: a scanning stage that discerns prospective positions to amplify the probability of acquiring the source, and a tracking stage that sharpens alignment to optimize SNR. This repetitive loop allows the agent to address structural errors and environmental issues instantly.
Experimental validation employed a refurbished 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL), formerly a telecommunication earth station, which now serves as a research platform for control and signal acquisition experiments in radio astronomy. Rather than aiming at immediate scientific observations, this work emphasizes the design and validation of an intelligent control system capable of maintaining accurate pointing under real-world disturbances such as wind load, mechanical backlash, and atmospheric refraction. The fundamental characteristics of parabolic reflector antennas—such as beamwidth, gain, and pointing accuracy—follow the classical formulations [29,30]. These theoretical foundations serve as the basis for the system calibration and performance evaluation presented in this work.
This setup incorporates servo motors with dual axes, encoders that boast high resolution, and a procedure for zenith calibration to ensure proper alignment (see Section 2). The receiver chain utilizes a HamGeek ZedBoard with an AD9361 software-defined radio (SDR), integrated with custom-developed software by the research team, built on the GNU Radio framework for digital signal processing and real-time SNR computation.
The RL agent is implemented using an actor–critic architecture built on PPO, incorporating self-attention layers. Hyperparameter optimization is conducted with Optuna, ensuring stable training and robust policy performance.
This work presents three key contributions. Initially, it outlines the significance of employing reinforcement learning in telescope control, particularly its strength in overcoming noise, nonlinearities, and ambiguities. Second, it bridges theory with practice by implementing an AI-driven controller on real telescope hardware. Third, it simplifies system architecture by using the observation system itself for both tracking and data acquisition, thereby eliminating the need for auxiliary tracking system such as mono pulse tracking system [31] and reducing cost and complexity, where similar challenges in real-time localization and noise mitigation arise.
This research establishes a basis for future autonomous observatories by combining reinforcement learning with telescope systems for effective operation in unpredictable conditions. Such advancements not only extend astronomical capabilities but also contribute to applied sensing technologies in radar and beyond.

2. Materials and Methods

The proposed framework consists of three integrated subsystems: the positioning control system, the tracking control system, and the receiver system as illustrated in Figure 1. These components operate collaboratively to achieve high-precision target tracking.

2.1. Positioning Control System

The positioning control system actuates the radio telescope antenna to the commanded position and reports the actual pointing coordinates to the other subsystems. This function utilizes hardware created during the refurbishment of the satellite communication ground station at KMITL. The mechanical structure utilizes an X-over-Y configuration, facilitating primary north-south movement via the Y-axis while the X-axis governs secondary east-west motion. Both axes remain orthogonal at 90°, as shown in Figure 2.
In contrast to traditional localization frameworks in astronomy, which generally utilize elevation–azimuth (El/Az) coordinates, the renovated telescope functions employing an X–Y coordinate arrangement, as illustrated in Figure 3.
Conventional radio telescopes typically employ an elevation–azimuth (El/Az) mount, whereas the renovated system described in this study adopts an X–Y movement system. Consequently, the standard El/Az coordinates obtained from astronomical algorithms cannot be directly applied to the telescope’s drive system. A numerical alteration is consequently necessitated to transform these coordinates into the X–Y framework, which aligns with the mechanical axes of the mount.
Equations (1) and (2) define this transformation explicitly. The Y-axis coordinate is figured out by Equation (1) through the azimuth and elevation angles, and Equation (2) pinpoints the corresponding X-axis coordinate. Both equations are formulated from the trigonometric correlations between the El/Az and X–Y coordinate systems. This ensures that celestial positions expressed in the El/Az framework can be accurately mapped to the mechanical actuation system of the telescope, enabling precise pointing and tracking performance under the renovated mount design.
  Y = sin 1 ( sin Az ×   cos ( El ) )
X = tan 1 ( cos Az tan El )
The radio telescope dish is actuated by two 1.3 kW Yaskawa digital servo motors operating under a forward control scheme. Accurate axis orientation is achieved through an encoder assembly consisting of an Omron E6CP-A absolute encoder (gear ratio 1:1) and an E6B2-C incremental encoder (gear ratio 1:360), as illustrated in Figure 4.
Each axis is equipped with a combined encoder assembly, providing a cumulative angular resolution of 0.0005° (1.8 arc-seconds). The comprehensive framework of the control system, encompassing the amalgamation of servo motors and encoder components, is presented in Figure 5.
To calibrate the antenna dish at the zenith position and establish the home reference point, a 3-m plumb line was suspended from the center of the upper feed horn opening to a marked reference point at the lower feed horn opening. When the dish is precisely aligned with the zenith, the plumb line coincides with the reference mark, as shown in Figure 6. This position is defined as the home reference, at which the incremental encoder count is reset to zero. To reach this position, the dish is first rotated along each axis until the absolute encoder reading reaches its midpoint. From this point, the dish is advanced in the direction that increases the encoder reading by exactly one step using micro-stepping control. Once the absolute encoder registers the increment, the dish is rotated back by a predetermined number of micro-steps—determined experimentally until the plumb line exactly aligns with the reference mark—ensuring precise positioning at the zenith. This calibration procedure compensates for the coarse resolution of the absolute encoder by using it for approximate positioning, while micro-stepping provides the fine adjustment required for accurate alignment.

2.2. Receiver System

The receiver system is based on a HamGeek ZedBoard integrated with an AD9361 software-defined radio (SDR) module, which interfaces with the feed and LNA chain of the 12 m radio telescope. The AD9361 converts the radio frequency (RF) signal into a digital format and transmits the information through Ethernet to a designated mini-PC employing a specialized protocol. On the mini-PC, GNU Radio processes the incoming data, enabling real-time computation of the signal-to-noise ratio (SNR) for the tracking system and, when required, broadcasting raw complex baseband (I/Q) data. The assessments of Signal-to-Noise Ratio (SNR) along with the unprocessed I/Q data are sent through Ethernet for preservation, review, and depiction. This framework furnishes adaptable digital signal processing, precise SNR monitoring, and superior-quality data collection, thereby facilitating the scholarly aims of radio astronomy investigations. The principal specifications of the HamGeek ZedBoard with AD9361 SDR are summarized in Table 1.

2.3. Tracking Control System

The tracking system consists of a data parser and a reinforcement learning (RL) model, hereafter referred to as the “Agent”. The Agent issues commands to the positioning control system, specifying the pointing coordinates, and receives real-time SNR feedback from the receiver system. The operation of the Agent is divided into two phases: the scanning phase and the tracking phase.
In the scanning phase, three candidate points are arranged at 120° intervals around the last predicted target location, based on the dish position from the previous step. The Agent ascertains a suitable scanning radius and directs the positioning control system to methodically relocate the dish to each scanning point. The Agent evaluates an adequate scanning radius and guides the positioning control system to progressively relocate the dish to all scanning coordinates.
In the subsequent tracking phase, the Agent predicts the optimal direction and displacement required to move the dish toward the newly estimated target position. This two-phase process, combining exploration and refinement, enables the Agent to maximize SNR while compensating for positional uncertainties. The overall workflow of the scanning and tracking process is illustrated in Figure 7.

2.4. Reinforcement Learning Framework

Reinforcement Learning, a type of machine learning, presents the concept of learning from experience through interaction between three key components: the Agent, the Environment, and the State. In each iteration (st), the Agent observes data from the Environment and selects an appropriate action (at) according to its policy (π). The Environment then transitions to a new state and returns a reward (rt) based on the Agent’s action. The Agent uses this feedback to improve its policy, aiming to find the optimal policy that maximizes cumulative rewards across all states. This process can be described by Markov Decision Processes, where the action taken in one step may influence rewards in subsequent steps, and the cumulative reward can be expressed using Markov reward processes. The relationship between the Agent, the Environment, the selected Action, and the resulting Reward is illustrated in Figure 8. The cumulative reward for this process is calculated as Equation (3).
R t = r t + 1 + γ r t + 2 + =   k = 0 γ k   r t + k + 1
where γ is the discounted factor ∈(0, 1) and r t + k + 1 is the reward received k + 1   steps after time t.
In the proposed design, the Environment is conceptualized as a grid-world, wherein the breadth correlates to the secondary (east–west) axis and the elevation corresponds to the principal (north–south) axis of the radio telescope dish. The center of the grid represents the zenith position. The Agent operates in a continuous three-dimensional action space with values ranging from −1.0 to 1.0. The initial dimension governs movement along the X-axis, where −1.0 and 1.0 signify the maximum allowable displacement per time step toward the negative and positive X orientations, respectively, in relation to the preceding prediction point. The second-dimension controls motion along the Y-axis using the same scale, representing maximum displacement per step toward the negative and positive Y directions. The third dimension, ranging from 0.0 to 1.0, specifies the scanning radius. The reward function for training the Agent is defined as Equation (4).
r t = 100 · ( 1 D B W ) 2
where rt denotes the reward obtained from the action taken at stage t, D is the Cartesian distance between the actual antenna pointing. (xa,ya) and the position of the object (x0,y0) and D ≤ BW, BW denotes the beamwidth of the dish antenna.
The residual environmental variables were formulated to meticulously replicate actual situational conditions, emulating the functional attributes of a traditional radio telescope. A summary of these parameters is provided in Table 2.
The state information provided by the environment to the Agent consists of the measured positions of each scanning point together with the corresponding signal level (linear). In addition, it includes the position of the previous aiming point, the signal level measured at that location, and the scan radius factor from the last scanning stage. This information forms the basis for the Agent’s decision-making process in subsequent actions. The complete set of values can be formally expressed in Equations (5) and (6).
P m = U 1 Acc ctr , 1 + Acc ctr · U 1 Acc encoder , 1 + Acc encoder · P c
where P m denotes the measured position and P c represents the commanded position, U(x,y) denotes Uniform random variable between x and y, ACCctr denotes Accuracy in controlling the dish’s rotational position, Accencoder denotes Accuracy in reading the dish’s position.
Signal _ level Train _ measured = U Acc , 1 · exp 4 ln 2 D BW 2
where Acc is the accuracy factor, D is the Cartesian distance between the antenna pointing position and the target location, and BW denotes the beamwidth of the dish antenna.

Proximal Policy Optimization (PPO) Algorithm

In this study, the AI model is designed within an actor–critic framework, employing the Proximal Policy Optimization (PPO) algorithm [25] implemented via Stable-Baselines3, the reinforcement learning library of OpenAI [26]. A Transformation–Attention feature extractor precedes the policy (π) and value (V) networks. The input state vector—comprising the measured position, the corresponding signal-to-noise ratio (SNR), the last aiming point, and the previous SNR at that aiming point—is first normalized and then processed by a feature extractor composed of sequential single-head self-attention layers [26]. In each attention layer, the state is linearly projected into query, key, and value representations of dimension 32; scaled dot-product attention weights are obtained by applying the softmax function to the query–key similarity scores, and these weights are used to aggregate the value representations, producing a transformed output that emphasizes the most informative state components, as illustrates in Figure 9.
Stacking the attention layers enables the model to capture dependencies across the state elements. The final attention output is fed to two separates multilayer perceptrons, the policy network and the value network. Each network contains four fully connected hidden layers with 256 units per layer and hyperbolic tangent (tanh) activations. The policy network outputs a continuous action vector for beam-steering to the next aiming point, while the value network estimates the expected return for the current state. Overall, the architecture contains approximately 400,000 trainable parameters, balancing expressive feature extraction through attention with computational efficiency suitable for real-time operation, as shown in Figure 10.

2.5. Hyperparameter Optimization

To maximize the Agent’s performance in the radio telescope environment, a comprehensive hyperparameter tuning process was carried out, focusing on achieving an optimal balance between exploration and exploitation, the clipping parameter of the PPO algorithm for policy updates, the discount factor for future rewards, and other key training parameters. Hyperparameter optimization was executed utilizing Optuna, an open-source platform explicitly formulated for automated hyperparameter exploration [27], facilitating methodical discernment of the most efficacious configuration. The final model was implemented as an MLP-based actor–critic architecture within the PPO algorithm, employing a cosine annealing learning rate schedule in combination with the optimized parameters. This configuration achieved the targeted performance objectives, with the Agent attaining the highest performance observed among the training scenarios. The complete set of parameters for the final model is summarized in Table 3.

3. Experimental Setup and Design

This section defines the operational framework, assessment strategies, and the real-world arrangement used to evaluate the suggested reinforcement learning (RL)–based tracking approach. Both simulation-oriented validation and empirical examination on the refurbished 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL) Bangkok, Thailand (latitude 13.730872° N, longitude 100.787426° E). That originally constructed as a satellite earth station antenna for telecommunications were executed to evaluate system efficacy.

3.1. Experimental Setup on the Renovated 12-m Radio Telescope at KMITL

The experimental validation of the proposed AI-based tracking framework was carried out on the renovated 12-m radio telescope. The telescope offers bi-axial navigation via principal and auxiliary servo actuators, each associated with high-resolution position sensors. These actuators are driven by dedicated servo packs and control boards, enabling accurate azimuth and elevation positioning with sufficient torque and responsiveness to follow both astronomical and satellite targets. The mechanical substructure thereby guarantees a consistent foundation upon which sophisticated control algorithms may be assessed.
On the RF aspect, the telescope is outfitted with a broadband quad-ridged horn antenna amalgamated with a low-noise block (LNB) for preliminary evaluations with geostationary communication satellites. The LNB transforms the Ku-band signals to a frequency suitable for further handling. The output is guided through a low-noise amplifier (LNA) series to elevate sensitivity and counteract system noise prior to digitization. Signal acquisition and real-time oversight are executed utilizing a software-defined radio (SDR) predicated on the AD9361 transceiver, which furnishes both spectral data and signal-level feedback to the tracking entity. This integration of hardware facilitates a high-fidelity test environment that closely mimics operational radio astronomical instrumentation.
The control and monitoring layer integrates custom software with the control boards and SDR front-end via Ethernet, supported by a parser–staging module (Figure 11) that structures telemetry and converts SNR from dB to linear scale. Because the absolute SNR range is unknown, values are stored in an array and normalized using the observed maxima and minima during scanning and tracking. With this approach, both strong and weak radio sources are represented proportionally in the same relative scale, where a normalized value of 1.0 corresponds to the strongest received signal and 0.0 corresponds to the noise level. The reinforcement learning agent operates in this environment, processing the states to generate real-time control actions. A graphical dashboard provides continuous system visualization, while the integrated setup (Figure 12) establishes a platform for evaluating scanning and tracking algorithms under operational conditions.

3.2. Testing and Evaluation Protocols

Using the testing parameters listed in Table 4, the model was evaluated on a fully random-moving target with unpredictable direction and speed. The tracking results shown in Figure 13, demonstrate that the Agent successfully maintained continuous tracking for four hours. The transition from light to dark in the red and blue markers indicates the actual and predicted pointing positions, respectively. The corresponding Cartesian distance error in degrees is presented in Figure 14, with a maximum error of 0.3°, mean squared error (MSE) of 0.019°, mean absolute error (MAE) of 0.11°, and standard deviation (SD) of 0.1°.
To further assess the suggested astronomical concept, simulations were performed utilizing the galaxy M33 as the subject rather than a completely random-moving object. The simulations, based on hydrogen line (21 cm) signal tracking, were executed utilizing Astropy (version 7.1) [32], a dedicated Python 3 library for celestial calculations, employing the parameters summarized in Table 4.
The simulation hydrogen line tracking results for the M33, obtained between 20:00 until 24:00 local time (UTC+7) on 21 August 2025, are presented in Figure 15. As shown in Figure 16, the system attained a maximum tracking inaccuracy of 0.12°, alongside a mean squared inaccuracy (MSE) of 0.0091° and a mean absolute inaccuracy (MAE) of 0.0691°.

4. Results and Discussion

To evaluate the effectiveness of the proposed reinforcement learning (RL)–based tracking framework, a direct comparison was conducted against conventional programmed tracking using two-line element (TLE) data. The geostationary Thaicom-4 satellite, positioned at 119° E (119.5° E), was designated as the test target, as its stationary characteristic affords a consistent reference for performance evaluation. Observation was performed on the beacon signal at 11.451 GHz, which was down-converted to an intermediate frequency approximately 851 MHz, compatible with the AD9361 SDR front-end. Programmed tracking was expected to be highly accurate in this scenario, making it a stringent benchmark for the AI-driven approach.
The programmed tracking session was carried out from 20:30 to 21:00 local time (UTC+7) on 2 September 2025. During this period, the antenna maintained a pointing position near x = 21.86, y = −16.28, with the received SNR reaching a maximum of 25.21 dB and averaging 24.26 dB (Figure 17). This reflects the consistency of TLE-based control but also reveals limitations in achieving optimal signal strength.
In contrast, the RL-based session (21:20–21:50, same day) achieved a maximum SNR of 36.01 dB and an average of 28.84 dB (Figure 18), significantly higher than the programmed baseline.
A direct comparison of the two methods is presented in Figure 19. The programmed method maintained a fixed pointing solution, while the RL framework explored a broader pointing space and consistently converged on higher-SNR regions. The time-domain analysis in Figure 20 further illustrates this difference: the programmed approach sustained a nearly constant but lower SNR, whereas the RL-based method maintained higher average and peak levels, successfully sustaining continuous tracking throughout the full 30-min interval.
Together, these results confirm that the RL framework not only outperforms programmed tracking in terms of maximum and average SNR, but also provides robust performance over extended observation periods. Even under conditions where TLE-based tracking should be most reliable, the RL approach demonstrated superior adaptability and signal optimization.
Beyond these comparative findings, several pivotal considerations must be recognized. First, the dependence on unprocessed SNR values for decision-making renders the framework susceptible to perturbations and extraneous responses. This constraint underscores the essential necessity for the adoption of more sophisticated signal processing and classification methodologies that can proficiently differentiate celestial signals from ambient noise. Second, the present scanning procedure, which executes a scan at each juncture, imposes temporal overhead and results in the loss of observational data during scanning intervals—a limitation also observed in other scanning approaches such as CONSCAN. In the future, the system’s efficiency could be further enhanced by utilizing historical tracking data related to the target’s motion and position, enabling the artificial intelligence to continuously evaluate trends and predict future trajectories.
Furthermore, a multi-entity framework could be instituted, wherein one entity concentrates on examining under circumstances of elevated unpredictability, while another depends on retrospective deduction to uphold ongoing monitoring. In this regard, a hybrid approach combining real-time scanning and historical tracking data could be employed, allowing the AI to autonomously determine when to perform additional scans in cases of high prediction uncertainty, or to rely on past observations when the prediction accuracy is sufficiently high. This division of roles would enable more adaptive allocation of scanning effort, thereby improving both reactivity and data retention.
In conclusion, the findings substantiate RL’s capability as a practical solution for real-time antenna tracking, demonstrating clear superiority over programmed control. At the same time, the discussion identifies key areas for refinement—signal discrimination, trajectory prediction, and multi-agent coordination—that could further improve acquisition speed, reduce data loss, and strengthen robustness These advancements would extend the relevance of the framework to extensive surveys and temporal-domain astrophysics, wherein both precision and efficacy are critical.

5. Conclusions

This study demonstrates the potential of reinforcement learning for radio telescope tracking through both simulation and experimental validation on Renovated 12-m Radio Telescope. In simulation, the proposed framework achieved pointing accuracies of approximately 0.3° for randomly moving targets, indicating feasibility for astronomical applications requiring moderate precision tracking. In live satellite experiments with Thaicom-4, the framework achieved a higher signal-to-noise ratio during operation compared to conventional TLE-based programmed tracking, demonstrating its practical advantage under real-world conditions. Principal benefits of the methodology encompass flexible efficacy in uncertain conditions and streamlined system configuration via dual application of the observation antenna for both monitoring and data acquisition. The architecture eradicates the necessity for auxiliary tracking antennas, conceivably diminishing system intricacy and maintenance requirements. However, several limitations must be acknowledged. The signals acquired during scanning and tracking phases require post-processing, limiting applicability to strict real-time operations. The current implementation has been validated only under specific environmental conditions and may require additional robustness testing across diverse operational scenarios. Furthermore, the framework’s performance with multiple simultaneous targets or in high-interference environments remains unexplored. Future work should concentrate on comparative analyses with traditional control approaches to quantify enhancements in performance, extended validation under diverse environmental conditions, and formulation of strategies for multi-target situations. Real-time processing capabilities and computational efficiency optimizations represent additional areas for development. Despite these limitations, the results suggest that RL-based approaches represent a promising direction for autonomous telescope operations, particularly as astronomical surveys increasingly demand adaptive tracking of transient and dynamic sources. The structure’s relevance may broaden to additional radio frequency-oriented localization endeavors, although verification would be necessitated for each application sector. While the current configuration is limited in sensitivity for deep-space sources, the framework demonstrates continuous tracking. This highlights its potential as a foundation for intelligent control systems in next-generation, high-sensitivity radio telescopes.

Author Contributions

Conceptualization, P.P. and T.S.; methodology, T.S.; software, T.S.; validation, T.S., P.P. and P.L.; formal analysis, T.S., P.P. and P.L.; investigation, T.S., P.P., P.L. and S.P.; resources, P.P.; data curation, T.S.; writing—original draft preparation, T.S.; writing—review and editing, T.S., P.P., P.L., P.W., N.H. and S.P.; visualization, T.S.; supervision, P.P.; project administration, T.S.; funding acquisition, P.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by King Mongkut’s Institute of Technology Ladkrabang [2564-02-01-035].

Data Availability Statement

No data was used.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Davies, R.; Kasper, M. Adaptive optics for astronomy. Annu. Rev. Astron. Astrophys. 2012, 50, 305–351. [Google Scholar] [CrossRef]
  2. Hill, R.; Masui, K.W.; Scott, D. The spectrum of the universe. Appl. Spectrosc. 2018, 72, 663–688. [Google Scholar] [CrossRef] [PubMed]
  3. Sutinjo, A.T.; Colegate, T.M.; Wayth, R.B.; Hall, P.J.; de Lera Acedo, E.; Booler, T.; Faulkner, A.J.; Feng, L.; Hurley-Walker, N.; Juswardy, B.; et al. Characterization of a low-frequency radio astronomy prototype array in Western Australia. IEEE Trans. Antennas Propag. 2015, 63, 5433–5442. [Google Scholar] [CrossRef]
  4. Lazio, T.J.W.; Kimball, A.; Barger, A.J.; Brandt, W.N.; Chatterjee, S.; Clarke, T.E.; Condon, J.J.; Dickman, R.L.; Hunyh, M.T.; Jarvis, M.J.; et al. Radio astronomy in LSST era. Publ. Astron. Soc. Pac. 2014, 126, 196–209. [Google Scholar] [CrossRef]
  5. Bhatnagar, S.; Cornwell, T.J.; Golap, K. Solving for the Antenna Based Pointing Errors. EVLA Memo #84, National Radio Astronomy Observatory. 2004. Available online: http://www.aoc.nrao.edu/evla/geninfo/memoseries/evlamemo84.pdf (accessed on 28 August 2023).
  6. Naval Observatory Vector Astrometry Software. Available online: https://aa.usno.navy.mil/software/novas_info (accessed on 23 August 2023).
  7. Stellarium. Stellarium. Available online: http://stellarium.org (accessed on 28 August 2023).
  8. Mokhun, S.; Fedchyshyn, O.; Kasianchuk, M.; Chopyk, P.; Basistyi, P.; Matsyuk, V. Stellarium software as a means of development of students’ research competence while studying physics and astronomy. In Proceedings of the 2022 12th International Conference on Advanced Computer Information Technologies (ACIT), Ruzomberok, Slovakia, 26–28 September 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 587–591. [Google Scholar]
  9. Meeus, J.H. Astronomical Algorithms; Willmann-Bell, Incorporated: Virginia, NV, USA, 1991. [Google Scholar]
  10. Wang, J.; Zhao, Y.; Yang, C.; Shi, Y.; Hao, Y.; Zhang, H.; Sun, J.; Luo, D. The analysis and verification of IMT-2000 base station interference characteristics in the FAST radio quiet zone. Universe 2023, 9, 248. [Google Scholar] [CrossRef]
  11. Wang, Y.; Zhang, H.; Wang, J.; Huang, S.; Hu, H.; Yang, C. A Software for RFI Analysis of Radio Environment around Radio Telescope. Universe 2023, 9, 277. [Google Scholar] [CrossRef]
  12. Ayodele, P.; Olabisi, F. Interference protection of radio astronomy services using cognitive radio spectrum sharing models. In Proceedings of the 2015 European Conference on Networks and Communications (EuCNC), Paris, France, 29 June–2 July 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 86–90. [Google Scholar]
  13. Küçük, I.; Üler, I.; Öz, Ş.; Onay, S.; Özdemir, A.R.; Gülşen, M.; Sarıkaya, M.; Daǧtekin, N.D.; Özeren, F.F. Site selection for a radio astronomy observatory in Turkey: Atmospherical, meteorological, and radio frequency analyses. Exp. Astron. 2012, 33, 1–26. [Google Scholar] [CrossRef]
  14. Xu, Q.; Xue, F.; Wang, H.; Yi, L. Measurement and Correction of Pointing Error Caused by Radio Telescope Alidade Deformation based on Biaxial Inclination Sensor. Micromachines 2023, 14, 1283. [Google Scholar] [CrossRef] [PubMed]
  15. Huang, C.N.; Chung, A. An intelligent design for a PID controller for nonlinear systems. Asian J. Control 2016, 18, 447–455. [Google Scholar] [CrossRef]
  16. Tehrani, R.D.; Givi, H.; Crunteanu, D.-E.; Cican, G. Adaptive predictive functional control of XY pedestal for LEO satellite tracking using Laguerre functions. Appl. Sci. 2021, 11, 9794. [Google Scholar] [CrossRef]
  17. De Vicente, P.; Bolaño, R.; Barbas, L. The control system of the 40 m radiotelescope. In Proceedings of the IX Scientific Meeting of the Spanish Astronomical Society Held on September, Madrid, Spain, 13–17 September 2010. [Google Scholar]
  18. Wang, H.; Zhao, X.; Tian, Y. Trajectory tracking control of XY table using sliding mode adaptive control based on fast double power reaching law. Asian J. Control 2016, 18, 2263–2271. [Google Scholar] [CrossRef]
  19. Herbst, G. A Simulative Study on Active Disturbance Rejection Control (ADRC) as a Control Tool for Practitioners. Electronics 2013, 2, 246–279. [Google Scholar] [CrossRef]
  20. Chen, W.; Yung, K.L.; Cheng, K. A learning scheme for low-speed precision tracking control of hybrid stepping motors. IEEE/ASME Trans. Mechatron. 2006, 11, 362–365. [Google Scholar] [CrossRef]
  21. Bai, W.; Zhou, Q.; Li, T.; Li, H. Adaptive reinforcement learning neural network control for uncertain nonlinear system with input saturation. IEEE Trans. Cybern. 2020, 50, 3433–3443. [Google Scholar] [CrossRef] [PubMed]
  22. Li, J.; Wang, Y.; Li, Y.; Luo, W. Reference trajectory modification based on spatial iterative learning for contour control of two-axis NC systems. IEEE/ASME Trans. Mechatron. 2020, 25, 1266–1275. [Google Scholar] [CrossRef]
  23. Zhu, M.; Wang, Y.; Pu, Z.; Hu, J.; Wang, X.; Ke, R. Safe, efficient, and comfortable velocity control based on reinforcement learning for autonomous driving. Transp. Res. Part C Emerg. Technol. 2020, 117, 102662. [Google Scholar] [CrossRef]
  24. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2nd ed.; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  25. Schulman, J.; Wolski, F.; Dhariwal, P.; Radford, A.; Klimov, O. Proximal policy optimization algorithms. arXiv 2017, arXiv:1707.06347. [Google Scholar] [CrossRef]
  26. Lapan, M. Deep Reinforcement Learning Hands-on: Apply Modern RL Methods to Practical Problems of Chatbots, Robotics, Discrete Optimization, Web Automation, and More, 2nd ed.; Packt Publishing Ltd.: Birmingham, UK, 2020; Available online: https://search.ebscohost.com/login.aspx?drect=true&scope=site&db=nlebk&db=nlabk&AN=2366458 (accessed on 28 August 2023).
  27. Manchin, A.; Abbasnejad, E.; van den Hengel, A. Reinforcement Learning with Attention that Works: A Self-Supervised Approach. arXiv 2019, arXiv:1904.0336. [Google Scholar]
  28. Akiba, T.; Sano, S.; Yanase, T.; Ohta, T.; Koyama, M. Optuna: A next-generation hyperparameter optimization framework. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2623–2631. [Google Scholar]
  29. Kraus, J.D. Radio Astronomy, 2nd ed.; Cygnus-Quasar Books: Powell, OH, USA, 1986. [Google Scholar]
  30. Baars, J.W.M. The Paraboloidal Reflector Antenna in Radio Astronomy and Communication: Theory and Practice; Springer: Dordrecht, The Netherlands, 2007. [Google Scholar]
  31. Imbriale, W.A.; Yuen, J.H. Large Antennas of the Deep Space Network; Wiley: Hoboken, NJ, USA, 2005. [Google Scholar]
  32. Price-Whelan, A.M.; Lim, P.L.; Earl, N.; Starkman, N.; Bradley, L.; Shupe, D.L.; Patil, A.A.; Corrales, L.; Brasseur, C.E.; Nöthe, M.; et al. The Astropy Project: Sustaining and growing a community-oriented open-source project and the latest major release (v5. 0) of the core package. Astrophys. J. 2022, 935, 167. [Google Scholar]
Figure 1. Workflow of the proposed framework, showing data flow between the positioning control, tracking control, and receiver subsystems.
Figure 1. Workflow of the proposed framework, showing data flow between the positioning control, tracking control, and receiver subsystems.
Galaxies 13 00124 g001
Figure 2. The renovated 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL), showing the X-over-Y mount configuration developed during the ground station conversion project.
Figure 2. The renovated 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL), showing the X-over-Y mount configuration developed during the ground station conversion project.
Galaxies 13 00124 g002
Figure 3. Relationship between the standard elevation–azimuth (El/Az) coordinates and the X–Y movement system of the renovated radio telescope.
Figure 3. Relationship between the standard elevation–azimuth (El/Az) coordinates and the X–Y movement system of the renovated radio telescope.
Galaxies 13 00124 g003
Figure 4. Configuration of the E6CP-A absolute encoder and the E6B2-C incremental encoder used for axis orientation measurement.
Figure 4. Configuration of the E6CP-A absolute encoder and the E6B2-C incremental encoder used for axis orientation measurement.
Galaxies 13 00124 g004
Figure 5. Overall control system architecture of the renovated 12-m radio telescope, showing the integration of servo motors, encoder sets, and control software.
Figure 5. Overall control system architecture of the renovated 12-m radio telescope, showing the integration of servo motors, encoder sets, and control software.
Galaxies 13 00124 g005
Figure 6. Zenith calibration of the radio telescope dish using a 3-m plumb line aligned between the upper feed horn opening and the reference mark at the lower feed horn opening.
Figure 6. Zenith calibration of the radio telescope dish using a 3-m plumb line aligned between the upper feed horn opening and the reference mark at the lower feed horn opening.
Galaxies 13 00124 g006
Figure 7. Workflow of the scanning and tracking processes in the proposed reinforcement learning framework.
Figure 7. Workflow of the scanning and tracking processes in the proposed reinforcement learning framework.
Galaxies 13 00124 g007
Figure 8. Conceptual diagram illustrating the relationship between Agent, Environment, Action, and Reward in reinforcement learning.
Figure 8. Conceptual diagram illustrating the relationship between Agent, Environment, Action, and Reward in reinforcement learning.
Galaxies 13 00124 g008
Figure 9. Architecture of Self-Attending Network (SAN) in this study.
Figure 9. Architecture of Self-Attending Network (SAN) in this study.
Galaxies 13 00124 g009
Figure 10. Overall Architecture of the Agent in this study.
Figure 10. Overall Architecture of the Agent in this study.
Galaxies 13 00124 g010
Figure 11. Data processing flow of the parser–staging module, converting raw positional and SNR measurements into the observation vector.
Figure 11. Data processing flow of the parser–staging module, converting raw positional and SNR measurements into the observation vector.
Galaxies 13 00124 g011
Figure 12. Experimental setup of the proposed AI-based tracking system on the renovated 12-m radio telescope at KMITL.
Figure 12. Experimental setup of the proposed AI-based tracking system on the renovated 12-m radio telescope at KMITL.
Galaxies 13 00124 g012
Figure 13. The simulation results of the fully random-moving target and its tracking: (a) overview of the tracking trajectory; (b) Zoomed view at the final time step. Here, “X” indicates the aiming point, while the dot represents the true target position.
Figure 13. The simulation results of the fully random-moving target and its tracking: (a) overview of the tracking trajectory; (b) Zoomed view at the final time step. Here, “X” indicates the aiming point, while the dot represents the true target position.
Galaxies 13 00124 g013
Figure 14. Tracking error in Cartesian distance (degrees) at each time step during the four hours of the fully random-moving target tracking.
Figure 14. Tracking error in Cartesian distance (degrees) at each time step during the four hours of the fully random-moving target tracking.
Galaxies 13 00124 g014
Figure 15. Tracking simulation results for the M33 galaxy between 20:00 until 24:00 local time (UTC+7) on 21 August 2025: (a) overview of the tracking trajectory; (b) Zoomed view at the final time step. Here, “X” indicates the aiming point, while the dot represents the true target position.
Figure 15. Tracking simulation results for the M33 galaxy between 20:00 until 24:00 local time (UTC+7) on 21 August 2025: (a) overview of the tracking trajectory; (b) Zoomed view at the final time step. Here, “X” indicates the aiming point, while the dot represents the true target position.
Galaxies 13 00124 g015aGalaxies 13 00124 g015b
Figure 16. Tracking error in Cartesian distance (degrees) at each time step during the M33 galaxy simulation conducted between 20:00 and 24:00 local time (UTC+7) on 21 August 2025.
Figure 16. Tracking error in Cartesian distance (degrees) at each time step during the M33 galaxy simulation conducted between 20:00 and 24:00 local time (UTC+7) on 21 August 2025.
Galaxies 13 00124 g016
Figure 17. Programed tracking result for the Thaicom-4 satellite, the antenna pointing in the X–Y plane with the received SNR plotted on the Z-axis. The session 20:30–21:00 local time (UTC+7), 2 September 2025.
Figure 17. Programed tracking result for the Thaicom-4 satellite, the antenna pointing in the X–Y plane with the received SNR plotted on the Z-axis. The session 20:30–21:00 local time (UTC+7), 2 September 2025.
Galaxies 13 00124 g017
Figure 18. AI-based tracking result for the Thaicom-4 satellite, the adaptive pointing in the X–Y plane with SNR on the Z-axis. The session (21:20–21:50 local time (UTC+7), 2 September 2025) achieved a maximum SNR of 36.01 dB and an average of 28.84 dB, significantly higher than the programmed baseline.
Figure 18. AI-based tracking result for the Thaicom-4 satellite, the adaptive pointing in the X–Y plane with SNR on the Z-axis. The session (21:20–21:50 local time (UTC+7), 2 September 2025) achieved a maximum SNR of 36.01 dB and an average of 28.84 dB, significantly higher than the programmed baseline.
Galaxies 13 00124 g018
Figure 19. Comparison of AI-based tracking and programmed tracking. The AI tracking points achieve higher SNR values. The Red and blue vertical lines represent AI-based and programmed tracking points projected onto the X–Y plane, respectively, with their heights indicating the measured signal SNR.
Figure 19. Comparison of AI-based tracking and programmed tracking. The AI tracking points achieve higher SNR values. The Red and blue vertical lines represent AI-based and programmed tracking points projected onto the X–Y plane, respectively, with their heights indicating the measured signal SNR.
Galaxies 13 00124 g019
Figure 20. Time-domain comparison of SNR performance. The AI tracking (blue curve) exhibits higher peak and average SNR over time, while the programmed tracking (orange curve) remains steady at a lower average. Horizontal dashed lines indicate the respective average SNR values for each method.
Figure 20. Time-domain comparison of SNR performance. The AI tracking (blue curve) exhibits higher peak and average SNR over time, while the programmed tracking (orange curve) remains steady at a lower average. Horizontal dashed lines indicate the respective average SNR values for each method.
Galaxies 13 00124 g020
Table 1. HamGeek ZedBoard and AD9361 SDR Specifications.
Table 1. HamGeek ZedBoard and AD9361 SDR Specifications.
ParameterSpecification
RF Tuning Range70 MHz–6.0 GHz
Instantaneous BandwidthUp to 56 MHz
ADC/DAC Resolution12-bit
Maximum Sample Rate61.44 MS/s (Tx and Rx)
Rx Noise Figure~2.5 dB (typical, front-end dependent)
Rx Gain ControlManual or Automatic Gain Control (AGC)
Tx Output PowerProgrammable, up to +7 dBm
Digital InterfaceHigh-speed LVDS to Zynq SoC (ZedBoard)
Ethernet StreamingConfigurable via FPGA logic and embedded Linux
ClockingInternal oscillator or external 10 MHz reference
Table 2. Operational characteristics of the renovated 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL).
Table 2. Operational characteristics of the renovated 12-m radio telescope at King Mongkut’s Institute of Technology Ladkrabang (KMITL).
ParameterSpecification
SystemX–Y, Cassegrain reflector, beam-waveguide antenna
Driving systemDigital electric servo with position forward control system error 5%
Primary axis1.5 kW 1500 rpm 8.3 N-m, 1:30,000 gearing ratio
Secondary axis1.5 kW 1500 rpm 8.3 N-m, 1:59,400 gearing ratio
Primary axis speed0.30 deg/s
Secondary axis speed0.15 deg/s
Position measuring error 0.1%
Dish aperture10 m
Antenna beamwidth
Operational frequency range
2 degrees.
1–3 GHz. and 10.7–12.75 GHz.
Table 3. Hyperparameter configuration of the PPO-based reinforcement learning Agent for radio telescope tracking.
Table 3. Hyperparameter configuration of the PPO-based reinforcement learning Agent for radio telescope tracking.
ParemeterValueDescription
lr5.5 × 10−5Initial learning rate.
lr_min2.97 × 10−8Minimum learning rate.
batch_size256Minibatch size used to update the network
n_steps2048the number of steps to run for each environment per update
γ0.89Discounted factor for the future reward used for update
gae_lambda0.95Factor for trade-off of bias vs. variance for Generalized Advantage Estimator
Clip_range0.66PPO Clipping parameter
ent_coef0.0Entropy coefficient for the loss calculation
vf_coef0.5Value function coefficient for the loss calculation
net_arch_pi[256,256,256,256]Actor network size
net_arch_vf[256,256,256,256]Critic network size
activation_fntanhActivation function for MPL.
Table 4. Simulation environment parameters applied in the evaluation of the proposed tracking framework.
Table 4. Simulation environment parameters applied in the evaluation of the proposed tracking framework.
ParemeterValueDescription
SNR_acc0.95Accuracy of SNR measuring
Pointing_error0.01Pointing error of position control system
NS_Speed0.1Maximum speed in primary axis in degree/s.
EW_Speed0.1Maximum speed in secondary axis in degree/s.
Moving_period10Moving period in second for each step.
Measuring_period4Measuring SNR period in second for each step.
Time_zone7Time zone of radio telescope location.
Observation_lat13.7308°Latitude of radio telescope location.
Observation_lng100.7874°Longitude of radio telescope location.
SNR_acc0.95Accuracy of SNR measuring
Pointing_error0.01Pointing error of position control system
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sahavisit, T.; Laon, P.; Pourbunthidkul, S.; Wichittrakarn, P.; Phasukkit, P.; Houngkamhang, N. Reinforcement Learning-Driven Framework for High-Precision Target Tracking in Radio Astronomy. Galaxies 2025, 13, 124. https://doi.org/10.3390/galaxies13060124

AMA Style

Sahavisit T, Laon P, Pourbunthidkul S, Wichittrakarn P, Phasukkit P, Houngkamhang N. Reinforcement Learning-Driven Framework for High-Precision Target Tracking in Radio Astronomy. Galaxies. 2025; 13(6):124. https://doi.org/10.3390/galaxies13060124

Chicago/Turabian Style

Sahavisit, Tanawit, Popphon Laon, Supavee Pourbunthidkul, Pattharin Wichittrakarn, Pattarapong Phasukkit, and Nongluck Houngkamhang. 2025. "Reinforcement Learning-Driven Framework for High-Precision Target Tracking in Radio Astronomy" Galaxies 13, no. 6: 124. https://doi.org/10.3390/galaxies13060124

APA Style

Sahavisit, T., Laon, P., Pourbunthidkul, S., Wichittrakarn, P., Phasukkit, P., & Houngkamhang, N. (2025). Reinforcement Learning-Driven Framework for High-Precision Target Tracking in Radio Astronomy. Galaxies, 13(6), 124. https://doi.org/10.3390/galaxies13060124

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop