Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Self-Adjusting Optical Systems Based on Reinforcement Learning

Photonics 2023, 10(10), 1097; https://doi.org/10.3390/photonics10101097

by Evgenii Mareev^1,*

, Alena Garmatina^1,2, Timur Semenov^1,2, Nika Asharchuk¹, Vladimir Rovenko¹ and Irina Dyachkova¹

Reviewer 1:

Mubashir Hussain

Reviewer 2: Anonymous

Photonics 2023, 10(10), 1097; https://doi.org/10.3390/photonics10101097

Submission received: 16 August 2023 / Revised: 25 September 2023 / Accepted: 27 September 2023 / Published: 29 September 2023

(This article belongs to the Section Data-Science Based Techniques in Photonics)

Round 1

Reviewer 1 Report

The presented research work is about self-adjusting optical systems based on reinforcement learning. The authors have developed an experimental setup based on reinforcement learning that would stabilize and maximize the amplitude of the X-ray signal generated in a laser-plasma source. The presented work is interesting and multidisciplinary work which still require some of the minor revisions:

1. Line 77: the power rating of laser source should be corrected as (1~2.5 ×1014 W/cm²). I didn’t understand why authors have used “I” and “.” in power rating formula.

2. Line 81: “The vibrations of the rotating surface 81 on the shaft did not exceed 2 μm.” The unit of vibration should be in terms of frequency. The authors must clarify the statement. Similarly, the next sentence “This displacement algorithm allowed a single target to be operated for at least 5 hours.” is not clear.

3. The authors didn’t mention the variables associated with experimental setup used in the neural network algorithm. Also, which data was used for training machine learning algorithm?

4. Line 143: Training parameters are mentioned in generalized terms. More detailed information is required.

5. What is delay time to acquire the optimum value of x-ray signal?

6. There is another article which is very relevant to given research. I suggest to mention work in introduction and do comparison with previous studies. “Deep reinforcement learning for optical systems: A case study of mode-locked lasers. Machine Learning: Science and Technology. 2020 Oct 15;1(4):045013.”

Author Response

First of all, we want to thank reviewers for their work. Their comments helped us to improve the quality of our manuscript. The all changes are highlighted in the red-line version of the manuscript. Following the reviewers and editors comment we increased the methods and discussion section. The point-by-point answer are presented below.

Line 77: the power rating of laser source should be corrected as (1~2.5 ×1014 W/cm²). I didn’t understand why authors have used “I” and “.” in power rating formula.

Answer: We change the fragment to "Intensity on the target surface ~2.5*10¹⁴ W/cm²"

Line 81: “The vibrations of the rotating surface 81 on the shaft did not exceed 2 μm.” The unit of vibration should be in terms of frequency. The authors must clarify the statement. Similarly, the next sentence “This displacement algorithm allowed a single target to be operated for at least 5 hours.” is not clear.

Answer: Unfortunately, we could not measure the exact vibration per one rotation. We measured the mean value of vibration (MSR) during 10 minutes interval. We added the following information to the text: “(mean square root value over 10-minute interval)”.

The five hours is the maximal time interval of the uninterrupted experiment. During this time interval we could guaranty that the dispersion do not exceed 15%. Following the recommendation we changed the text to: “This displacement algorithm allowed a single target to be operated for at least 5 hours – during this time interval the dispersion of the X-Ray intensity do not exceed 15%”

The authors didn’t mention the variables associated with experimental setup used in the neural network algorithm. Also, which data was used for training machine learning algorithm?

Answer: We used the intensity of the second harmonic (feedback) and current coordinate as input parameters for the neural network and action as an output. For training the Q-value obtained during experiments is used, because it is reinforcement learning algorithm. Following the recommendation we added the following paragraph to the text: “In the NN we used the second harmonic intensity as a feedback and the current coordinate of the target (multiplied on the 0.001 factor). This values are used as inputs for NN. The action (move left/move right/wait) is the output of the NN as it presented in Fig.2.The training parameters was selected after the training the NN in the sandbox as it is discussed further. The training was performed over 100000 iterations, which guaranties the stable (close to 100%) result, i.e. in the verification stage the NN stabilize the system (decrease dispersion and neutralizes the linear trend). During training the NN initially acts randomly (for ~ 10000 iterations, the possibility of the random action is decays as 10^-4), that is needed for filling the memory buffer. After that the NN acts based on the result of developed algorithm. The learning process finishes when the error becomes smaller than 0.01 (if the number of iterations is higher than 20000) or after 100000 steps.”

Line 143: Training parameters are mentioned in generalized terms. More detailed information is required.

Answer: We added this information to the text as it is mentioned in the previous answer.

What is delay time to acquire the optimum value of x-ray signal?

Answer: The training procedure is completed in about two hours. In the working regime it takes about one minute to achieve the optimal value. We added this information to the text of the manuscript.

There is another article which is very relevant to given research. I suggest to mention work in introduction and do comparison with previous studies. “Deep reinforcement learning for optical systems: A case study of mode-locked lasers. Machine Learning: Science and Technology. 2020 Oct 15;1(4):045013.”

Answer: We added the reference.

Reviewer 2 Report

The manuscript entitled “Self-Adjusting Optical Systems Based on Reinforcement Learning” by Evgenii Mareev et al. proposed an experimental setup based on reinforcement learning that can stabilize and maximize the amplitude of X-ray signal generated in a laser-plasma source.

In general, the manuscript is well-written with details of theories and experimental results. However, I have some questions that need to be addressed before publication. I recommend a minor revision of the manuscript for the authors to answer the following questions:

(1) For Figure 1 and Figure 2, I suggest the authors to add more descriptions of each figure. One good example is Figure 1 and Figure 2 in this paper: “Lin, Xing, et al. "All-optical machine learning using diffractive deep neural networks." Science 361.6406 (2018): 1004-1008.”

(2) For Figure 5, please add labels (a)(b)(c) for each figure.

(3) For section “3.3. Coupling the laser pulse into a fiber using reinforcement machine learning.”, I think the purpose of coupling light into the fiber is unclear. I suggest the authors to add more explanations about the motivation of this section.

(4) For Figure 5c, I suggest the authors enlarge the dashed line width. It’s hard to see the dashed line clearly in the current setting.

(5) If I understand it correctly, there should be a mistake for the title of section 3.4, “3.4. Coupling the laser pulse into a fiber using reinforcement machine learning.” It should not be the same as the title of section 3.3. Please correct it if needed.

(6) Ideally, the amplitude w/ ML should be higher than w/o ML, as shown in Figure 4. However, in Figure 6b we can observe that w/ ML (black line), the maximum amplitude is lower than w/o ML (red line). I expect the authors to provide more explanations on this observation.

Author Response

Answer: We give the more detailed description as it is presented in the red-line version of the manuscript.

(2) For Figure 5, please add labels (a)(b)(c) for each figure.

Answer: We added the labels.

Answer: We added the following text fragment to the manuscript: “For this purpose, the neural network was modified to perform movements in two coordinates, thereby adding two additional actions. The light was couple inside the fiber changing the angle of incidence of the laser beam onto the kinematic mirror mount (Thorlabs KM-100). By adjusting two alignment screws, the angle of deviation of the laser beam could be changed, allowing for coupling into the spectrometer.”

(4) For Figure 5c, I suggest the authors enlarge the dashed line width. It’s hard to see the dashed line clearly in the current setting.

Answer: We changed the line width and its color in a) and b).

Answer: We fixed the typo.

Answer: The observed maxima is obtained on the small ledge that is formed on the edge of the target where the stepper motor changes its direction. Due to local field enhancement on this edge the X-Ray is increased. However, this leads to the decrease of the mean signal. During the NN-based algorithm the edge does not form and this maximum is less pronounced. We added the following information to the text of the manuscript.

Article Menu

Self-Adjusting Optical Systems Based on Reinforcement Learning

Further Information

Guidelines

MDPI Initiatives

Follow MDPI