Reinforcement Learning and Physics

: Machine learning techniques provide a remarkable tool for advancing scientiﬁc research, and this area has signiﬁcantly grown in the past few years. In particular, reinforcement learning, an approach that maximizes a (long-term) reward by means of the actions taken by an agent in a given environment, can allow one for optimizing scientiﬁc discovery in a variety of ﬁelds such as physics, chemistry, and biology. Morover, physical systems, in particular quantum systems, may allow one for more efﬁcient reinforcement learning protocols. In this review, we describe recent results in the ﬁeld of reinforcement learning and physics. We include standard reinforcement learning techniques in the computer science community for enhancing physics research, as well as the more recent and emerging area of quantum reinforcement learning, inside quantum machine learning, for improving reinforcement learning computations.


Introduction
The ubiquity of bigger and bigger data sets has made machine learning (ML) a common tool for knowledge discovery from those large data sets. The solid theoretical foundations of ML [1,2] have found their application in many different fields [3], physics being no exception [4,5]. In the particular case of quantum information processing [6], there are remarkable attempts that make use of quantum resources to enhance learning, particularly in terms of processing time, than can be reduced considerably, providing relevant speedups, sometimes quadratic or exponential [7].
The application of classical ML to different problems in physics has also become more and more common recently [8][9][10], even leading to the field of physics-based ML [11,12]. reinforcement learning (RL), the learning paradigm that this review focuses on, has been applied for the control of physical systems [13][14][15][16].
The next section will be devoted to go over the main contributions of classical RL to different problems in physics (Section 2.1), and quantum RL (Section 2.2), respectively. Section 3 summarizes this mini-review with some concluding remarks.

Reinforcement Learning and Physics
RL is a ML paradigm that optimizes decision making based on stages [17]. While it cannot be considered supervised learning because the desired outputs are not known in advance to train the model, it is not unsupervised or semisupervised learning either because there is not a training limitation due to a lack of labels [2,18].
The RL framework is shown in Figure 1; it consists of an agent that takes actions in a given environment; those actions have an associated immediate reward. The goal of the learning process is to maximize the long-term reward, a function that depends on the sum of all the rewards that are collected over time: where the immediate reward r t+k+1 is the value returned by the environment depending on the action taken by the agent at time t + k + 1 and γ ∈ [0, 1] is the so-called discountrate, that sets the relative importance of future rewards. The environment is explored by taking different actions, what leads to learning the action-value function or Q-function, that estimates the expected future reward R t following the policy π(s, a): AGENT ENVIRONMENT st (before action at) st+1 (after action at) rt+1 at The optimal policy appears as the result of learning the Q-function: Equation (3) corresponds with a deterministic policy, in which given a state, only one action can be taken. Stochastical policies encompass more than one possible action according to probabilities encoded in the policy [17].
The goal of RL algorithms is to calculate Q π (s, a) in order to obtain the optimal policy using. Many different methods can be considered, basically grouped into three main approaches [17]: Sarsa and Q-learning are the most-widely used TD methods. Sarsa is an on-policy algorithm that modifies the starting policy towards the optimal one whereas Q-learning is off-policy and computes the optimal policy while the agent is interacting with the environment by means of another arbitrary policy, Equation (4): where α is the rate of the update; s and a are the state and the action in time t + 1, respectively. Q t stands for the action-value function for a particular state before being visited at time t. Q t+1 is the updated value of that state once it has been visited. RL is a natural approach for system control, that has been successfully applied to many different fields, from robotics [21] to marketing [22] and medicine [23], to name a few. In the case of physics, two main approaches arise, namely, the use of standard RL to control different parts of physical systems (either classical or quantum), and the so-called quantum RL, a quantum version of RL that shares objectives with other quantum ML approaches, i.e., to make use of quantum technologies to carry out ML calculations in a more efficient way. Both approaches are described in the next two subsections.

Standard Reinforcement Learning for Physics Research
The use of RL has spread across different physics applications in recent years, particularly in quantum physics. One of the first remarkable approaches proposed the application of RL to adaptive quantum metrology [15], in which RL-based control achieved a better control of of quantum processes than standard greedy approaches. In [24], RL demonstrates to be able to find the ground state and describe the unitary time evolution of complex interacting quantum systems. The ability of RL to optimize quantum-error-correction strategies, thus protecting qubits against noise is shown in [25]. In another original work [26], control based on RL shows a similar performance to optimal control methods in many-body quantum systems of interacting qubits.
RL has also been applied in the field of quantum computing for dynamic non-convex optimization in ultra-cold-atom experimentation [14] and measure control in order to facilitate the access to quantum states [13]. Other relevant works to deal with the issue of smart and efficient quantum measures make use of active learning [27,28].
The advent of Deep Learning (DL), that has allowed the resolution of data-driven problems that were unapproachable just a few years ago, has also produced an impact on RL by means of deep policies and DL-based function approximations, leading to the so-called Deep Reinforcement Learning (DRL) [29]. DRL has already been used for efficient measuring of quantum devices [30], for control optimization in quantum state preparation [31], for gate control [32] or for robust digital quantum control to break adiabatic quantum control [16].
Although this subsection has focused on RL to different quantum problems, its application to classical physics is also common. In [33], evolutionary RL was applied to estimate the likelihood of dynamical large deviations, thus showing the suitability of ML in path-extensive physics problems. An interesting review of RL approaches to solve fluid mechanics problems is provided in [34]. RL has also found its application in other fields of physics like optics, e.g., for an adaptive control of astronomy systems [35], or in thermodynamics, to optimize thermodynamic trajectories [36], thus learning previously unknown thermodynamic cycles. An interesting RL-based solution in dynamics is presented in [37], where an efficient sampling of rare trajectories is achieved; in particular, the idea is to make rare events typical so that dynamical behaviors that appear with very low probability in non-equilibrium systems can be accessed in a statistically significant way. Therefore, we can conclude that standard RL is a common choice for optimization and control in different physics problems, particularly within the quantum realm. Recent RL approaches, like DRL, that enhances RL with DL have rapidly been applied to a number of physics control problems, with some relevant results, as shown in this subsection. Next, the quantum version of RL will be presented in Section 2.2.

Quantum Reinforcement Learning
Quantum machine learning [38] is an emerging field where the aim is either to employ quantum devices to carry out more efficient ML calculations, or to use ML algorithms to better control and design quantum systems. Inside quantum machine learning, quantum reinforcement learning (QRL) has been explored in the past few years [39][40][41][42][43][44][45][46][47][48]. Here the motivation is to design "intelligent" quantum agents capable of interacting with their environment and adapting to it, by means of quantum resources such as entanglement and superposition.
In [39], a QRL algorithm based on Grover search was introduced. This kind of approach may achieve a polynomial speedup with respect to standard RL algorithms, by means of genuine quantum features such as superposition and entanglement.
The concept of quantum agent was proposed in [40], which analyzed the situation of a quantum agent with a quantum processing unit that interacts via a classical channel with a classical environment. Similarly to [39], the authors showed that a polynomial speedup in the processing of the information acquired from the environment, by means of the quantum processor, could be achieved.
An exponential speedup was predicted in [41] via a quantum oracular environment. This paper also analyzed a general framework for quantum machine learning, involving as well quantum supervised and unsupervised learning.
The possibility to have a speedup in quantum systems is given, in part, by the quantum mechanics properties of the Hilbert space, in which quantum superposition of different states in this space is possible, and quantum superpositions of composite states give rise to entanglement. This quantum parallelism is a crucial ingredient for achieving the quantum speedup in quantum technologies.
An implementation of QRL with superconducting circuits was proposed in [42], for basic protocols involving projective measurements and feedback inside the coherence time of the quantum system. This was extended in [43] to other quantum platforms that may not need to employ feedback, but just projective measurements and ancillary qubits.
Reference [44] introduced a QRL protocol where several copies of an environment state are available, and a quantum agent is able to learn this environment state via succesive measurements on the copies and feedback on its own state, following the outcome of the measurements. A convenient balance between exploration and exploitation was considered to optimize the outcome. This proposal was carried out in the quantum platforms of quantum photonics [45] as well as superconducting circuits [46].
An extension of the previous theoretical work of [44] was proposed in [47] for learning quantum operations instead of quantum states.
Moreover, a review of the field of quantum machine learning, and specifically of QRL, with the quantum platform of quantum photonics was given by [48].
In summary, QRL is an exciting and intriguing field that in some situations may provide a speedup with respect to standard RL algorithms, and in general terms may allow one for improved control and measurement of quantum systems. First steps in this direction have been produced, both in theory and experiments in a variety of quantum platforms, and now the follow up should be further focused on scalability, for aiming at larger quantum agents that may provide a faster speedup. In this respect, one should point out that a different speedup, in small quantum systems, has been obtained, e.g., in [45], in the context of a reduced amount of resources. This is what may be called the "reduced resource scenario", for which a speedup with QRL may be achieved, in this case with respect to standard quantum tomography. Further evidence that a quantum speedup with respect to classical computers may be achieved in this reduced resource scenario was given in an experimental implementation of a quantum memristor with a quantum photonics device [49]. In any case, achieving quantum agents of sizes above 50 qubits or so, will allow one for promising applications in quantum control and ML.

Conclusions
In summary, RL is one of the most prominent paradigms in standard ML, and its connection to physics is producing a plethora of interesting results and perspectives inside scientifical and technological discovery. Both in the realms of classical and quantum physics, it is expected that RL will provide an acceleration of the rate of breakthrough achievements. Thus, it will contribute to scientific productivity in a time, nowadays, in which it is more expensive and harder as most of the "low hanging fruit" has already been taken since the first half of the 20th century.

Conflicts of Interest:
The authors declare no conflict of interest.