Next Article in Journal
Compressive Coded Rotating Mirror Camera for High-Speed Imaging
Next Article in Special Issue
Implementation of Pruned Backpropagation Neural Network Based on Photonic Integrated Circuits
Previous Article in Journal
Single Headlamp with Low- and High-Beam Light

Quantum Reinforcement Learning with Quantum Photonics

Departamento de Física Atómica, Molecular y Nuclear, Universidad de Sevilla, Apartado de Correos 1065, 41080 Sevilla, Spain
Photonics 2021, 8(2), 33;
Received: 26 December 2020 / Revised: 24 January 2021 / Accepted: 25 January 2021 / Published: 28 January 2021
(This article belongs to the Special Issue The Interplay between Photonics and Machine Learning)


Quantum machine learning has emerged as a promising paradigm that could accelerate machine learning calculations. Inside this field, quantum reinforcement learning aims at designing and building quantum agents that may exchange information with their environment and adapt to it, with the aim of achieving some goal. Different quantum platforms have been considered for quantum machine learning and specifically for quantum reinforcement learning. Here, we review the field of quantum reinforcement learning and its implementation with quantum photonics. This quantum technology may enhance quantum computation and communication, as well as machine learning, via the fruitful marriage between these previously unrelated fields.
Keywords: quantum machine learning; quantum reinforcement learning; quantum photonics; quantum technologies; quantum communication quantum machine learning; quantum reinforcement learning; quantum photonics; quantum technologies; quantum communication

1. Introduction

The field of quantum machine learning promises to employ quantum systems for accelerating machine learning [1] calculations, as well as employing machine learning techniques to better control quantum systems. In the past few years, several books as well as reviews on this topic have appeared [2,3,4,5,6,7,8].
Inside artificial intelligence and machine learning, the area of reinforcement learning designs “intelligent” agents capable of interacting with their outer world, the “environment”, and adapt to it, via reward mechanisms [9], see Figure 1. These agents aim at achieving a final goal that maximizes their long-term rewards. This kind of machine learning protocol is, arguably, the most similar one to the way the human brain learns. The field of quantum machine learning is recently exploring the fruitful combination of reinforcement learning protocols with quantum systems, giving rise to quantum reinforcement learning [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39].
Different quantum platforms are being considered for the implementation of quantum machine learning. Among them, quantum photonics seems promising because of the good integration with communication networks, information processing at the speed of light, as well as possible realization of quantum computations with integrated photonics [40]. Moreover, in the scenario with a reduced amount of measurements, quantum reinforcement learning with quantum photonics has been shown to perform better than standard quantum tomography [16]. Quantum reinforcement learning with quantum photonics has been proposed [15,17,20] and implemented [16,19] in diverse works. Even before these articles were produced, a pioneering experiment of quantum supervised and unsupervised learning with quantum photonics was carried out [41].
In this review, we first give an overview of the field of quantum reinforcement learning, focusing mainly on quantum devices employed for reinforcement learning algorithms [10,11,12,13,14,15,16,17,18,19,20], in Section 2. Later on, we review the proposal for measurement-based adaptation protocol with quantum reinforcement learning [15] and its experimental implementation with quantum photonics [16], in Section 3. Subsequently, we describe further works in quantum reinforcement learning with quantum photonics [19,20], in Section 4. Finally, we give our conclusions in Section 5.

2. Quantum Reinforcement Learning

The fields of reinforcement learning and quantum technologies have started to merge recently in a novel area, named quantum reinforcement learning [10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,35]. A subset inside this field is composed of articles studying quantum systems that carry out reinforcement learning algorithms, ideally with some speedup [10,11,12,13,14,15,16,17,18,19,20].
In Ref. [10], a pioneer proposal for reinforcement learning using quantum systems was put forward. This employed a Grover-like search algorithm, which could provide a quadratic speedup in the learning process as compared to classical computers [10].
Ref. [11] provided a quantum algorithm for reinforcement learning in which a quantum agent, possessing a quantum processor, can couple classically with a classical environment, obtaining classical information from it. The speedup in this case would come from the quantum processing of the classical information, which could be done faster than with classical computers. This is also based on Grover search, with a corresponding quadratic speedup.
In Ref. [12], a quantum algorithm considers a quantum agent coupled to a quantum oracular environment, attaining a proven speedup with this kind of configuration, which can be exponential in some situations. The quantum algorithm could be applied to diverse kinds of learning, namely reinforcement learning, but also supervised and unsupervised learning.
Refs. [10,11,12] have speedups with respect to classical algorithms. While the first two rely on a polynomial gain due to a Grover-like algorithm, the latter achieves its proven speedup via a quantum oracular environment.
The series of articles in Refs. [13,14,15,16,17,18] study quantum reinforcement learning protocols with basic quantum systems coupled to small quantum environments. These works focus mainly on proposals for implementations [13,14,15,17] as well as experimental realizations in quantum photonics [16] and superconducting circuits [18]. In the theoretical proposals, small few-qubit quantum systems are proposed both for quantum agents and quantum environments. In Ref. [13], the aim of the agent is to achieve a final state which cannot be distinguished from the environment state, even if the latter has to be modified, as it is a single-copy protocol. In order to achieve this goal, measurements are allowed, as well as classical feedback inside the coherence time. Ref. [14] extends the previous protocol to the case in which measurements are not considered, but instead further ancillary qubits coupled via entangling gates to agent and environment are employed, and later on disregarded. In Ref. [15], several identical copies of the environment state are considered, such that the agent, via trial and error, or, equivalently, a balance between exploration and exploitation, iteratively approaches the environment state. This proposal was carried out in a quantum photonics experiment [16] as well as with superconducting circuits [18]. In Ref. [17], a further extension of Ref. [15] to operator estimation, instead of state estimation, was proposed and analyzed.
Ref. [16] obtained a speedup as well with respect to standard quantum tomography, in the scenario with a reduced amount of resources, in the sense of reduced number of measurements.
Finally, Ref. [20] considered different paradigms of learning inside a reinforcement learning framework, which included projective simulation [42] and a possible implementation with quantum photonics devices. The latter, with high-repetition rates, high-bandwith and low crosstalks, as well as the possibility to propagate to long distances, makes this quantum platform an attractive one for this kind of protocol.

3. Measurement-Based Adaptation Protocol with Quantum Reinforcement Learning Implemented with Quantum Photonics

3.1. Theoretical Proposal

In this section we review the proposal in Ref. [15] introducing a quantum agent that can obtain information from several identical copies of an unknown environment state, via measurement, feedback, and application of random unitary gates. The randomness of previous operations is reduced as long as the agent approaches the environment state, ideally converging to a large extent to this.
The detailed protocol was as follows. One assumes a quantum system, the agent (A), and many identical copies of an unknown quantum state, the environment (E). An auxiliary system, the register (R), which interacts with E, was also considered. Then, information about E is obtained by measuring R, and the result is employed as input to a reward function (RF). Finally, one carries out a partially random unitary operation on A, depending on the output of the RF. The aim is to increase the overlap between A and E, without measuring the state of A.
In the rest of the description of the protocol we use a notation as follows: the subscripts A, R, and E are referred to each subsystem, while the superscripts denote the iteration. For example, O α ( k ) denotes the operator O acting on subsystem α during the kth iteration. In case we do not employ indices we refer to a general entity in the iterations as well as in the subsystems.
Here we will review the case for which the subsystems are respectively described by single-qubit states [15]. For a multilevel state description we refer to Ref. [15]. One considers that A ( R ) is encoded in | 0 A ( R ) , while E is represented by an arbitrary state described as | E E = cos ( θ ( 1 ) / 2 ) | 0 E + e i ϕ ( 1 ) sin ( θ ( 1 ) / 2 ) | 1 E . The initial state is given by
| ψ ( 1 ) = | 0 A | 0 R [ cos ( θ ( 1 ) / 2 ) | 0 E + e i ϕ ( 1 ) sin ( θ ( 1 ) / 2 ) | 1 E ] .
One subsequently introduces the ingredients of the reinforcement learning protocol, including the policy, the RF, as well as the value function (VF). We point out that we consider a definition for the VF which is different from standard reinforcement learning but which will fulfill our purposes. For carrying out the policy, one performs a controlled-NOT (CNOT) gate ( U E , R N O T ) where E is the control and R is the target (namely, the interaction with the environment system), in order that the information of E is transferred into R, achieving
| Ψ 1 = U E , R N O T | ψ ( 1 ) = | 0 A [ cos ( θ ( 1 ) / 2 ) | 0 R | 0 E + e i ϕ ( 1 ) sin ( θ ( 1 ) / 2 ) | 1 R | 1 E ] .
One would then measure the register qubit in the { | 0 , | 1 } basis, with probabilities p 0 ( 1 ) = cos 2 ( θ 1 / 2 ) and p 1 ( 1 ) = sin 2 ( θ 1 / 2 ) , to achieve state | 0 or | 1 , respectively (namely, extraction of information). If the outcome is | 0 , one will have collapsed E into A such that one does nothing, while if the outcome is | 1 , one has consequently measured the projection of E orthogonal to A, such that one accordingly updates the agent. Given that no further information about the environment is available, one carries out a partially-random unitary gate on A given by U A ( 1 ) ( α ( 1 ) , β ( 1 ) ) = e i S A z ( 1 ) α ( 1 ) e i S A x ( 1 ) β ( 1 ) (action), being α ( 1 ) and β ( 1 ) random angles given by α ( β ) ( 1 ) = ξ α ( β ) Δ ( 1 ) , while ξ α ( β ) [ 1 / 2 , 1 / 2 ] is a random number, Δ ( 1 ) is the random angle range, α ( β ) ( 1 ) [ Δ ( 1 ) / 2 , Δ ( 1 ) / 2 ] , and S A k ( 1 ) = S k is the kth component of the spin. Subsequently, one initializes the register qubit and considers a novel copy of E, achieving the following initial state for the second iteration:
| ψ ( 2 ) = U A ( 1 ) | 0 A | 0 R | E E = | 0 ¯ A ( 2 ) | 0 R | E E ,
U A ( 1 ) = m ( 1 ) U A ( 1 ) ( α ( 1 ) , β ( 1 ) ) + ( 1 m ( 1 ) ) I A .
Here we denote with m ( 1 ) = { 0 , 1 } the result of the measurement, while I is the unity operator, and we express the new agent state in the form | 0 ¯ 1 ( 2 ) = U 1 ( 1 ) | 0 1 .
Subsequently, one considers the RF to change the exploration interval of the kth iteration Δ ( k ) as
Δ ( k ) = ( 1 m ( k 1 ) ) R + m ( k 1 ) P Δ ( k 1 ) ,
where we denote with m ( k 1 ) the result of the ( k 1 ) th iteration and with R and P the reward and punishment ratios, respectively.
Equation (5) represents the fact that Δ is changed by R Δ for the subsequent iteration whenever m = 0 and by P Δ whenever the result is m = 1 . In the described protocol, one takes for the sake of simplicity R = ϵ < 1 and P = 1 / ϵ > 1 , in such a way that the value of Δ is reduced every time | 0 is measured, and it grows in the other situation. Moreover, given the fact that R · P = 1 , reward and punishment are of similar value, or, equivalently, if the algorithm provides equal number of results 0 and 1, the exploration interval is not modified. Additionally, one defines the VF as the Δ ( n ) after all the iterations have taken place. Thus, Δ ( n ) 0 whenever the algorithm converges to a maximal overlap between A and E.
In order to show in further detail how the algorithm works, we consider the kth step. The initial state in the protocol reads
| ψ ( k ) = | 0 ¯ A ( k ) | 0 R | E E ,
with | 0 ¯ A ( k ) = U A ( k ) | 0 A and U A ( k ) = U A ( k 1 ) U A ( k 1 ) , where we have U A ( 1 ) = I A while U A ( j ) is provided by Equation (4). Moreover U A ( j ) = e i S A z ( j ) α ( j ) e i S A x ( j ) β ( j ) , where one has
S A z ( j ) = 1 2 ( | 0 ¯ A ( j ) 0 ¯ | | 1 ¯ A ( j ) 1 ¯ | ) = U A ( j 1 ) S A z ( j 1 ) U A ( j 1 ) , S A x ( j ) = 1 2 ( | 0 ¯ A ( j ) 1 ¯ | + | 1 ¯ A ( j ) 0 ¯ | ) = U A ( j 1 ) S A x ( j 1 ) U A ( j 1 ) ,
where A ( j ) 0 ¯ | 1 ¯ A ( j ) = 0 . One can then express the state of system E in the Bloch basis, employing | 0 ¯ j as reference and subsequently act with the quantum gate U E ( k ) , achieving for E,
U E ( k ) | E E = U E ( k ) cos ( θ ( k ) / 2 ) | 0 ¯ E ( k ) + e i ϕ ( k ) sin ( θ ( k ) / 2 ) | 1 ¯ E ( k ) = cos ( θ ( k ) / 2 ) | 0 E + e i ϕ ( k ) sin ( θ ( k ) / 2 ) | 1 E = | E ¯ E ( k ) .
One can then express the states | 0 ¯ ( k ) and | 1 ¯ ( k ) by means of the initial logical vectors | 0 and | 1 as well as θ ( k ) , θ ( 1 ) , ϕ ( k ) , and ϕ ( 1 ) in the following way,
| 0 ¯ ( k ) = cos θ ( 1 ) θ ( k ) 2 | 0 + e i ϕ ( 1 ) sin θ ( 1 ) θ ( k ) 2 | 1 , | 1 ¯ ( k ) = e i ϕ ( k ) sin θ ( 1 ) θ ( k ) 2 | 0 + e i ( ϕ ( 1 ) ϕ ( k ) ) cos θ ( 1 ) θ ( k ) 2 | 1 .
Accordingly, the unitary gate U ( k ) carries out the required rotation to modify | 0 ¯ ( k ) | 0 and | 1 ¯ ( k ) | 1 . Subsequently, one applies the operation U E , R N O T ,
| Φ ( k ) = U E , R N O T | 0 ¯ A ( k ) | 0 R | E ¯ E = | 0 ¯ A ( k ) [ cos ( θ ( k ) / 2 ) | 0 R | 0 E + e i ϕ ( k ) sin ( θ ( k ) / 2 ) | 1 R | 1 E ] ,
and later one measures R, with respective probabilities p 0 ( k ) = cos 2 ( θ ( k ) / 2 ) and p 1 ( k ) = sin 2 ( θ ( k ) / 2 ) for the results m ( k ) = 0 and m ( k ) = 1 . Then, one acts with the RF provided by Equation (5). We remark that, statistically, when p 0 ( k ) 1 , Δ 0 , and when p 1 ( k ) 1 , Δ 4 π . With respect to the exploration–exploitation tradeoff, this implies that whenever the exploitation is reduced (one obtains | 1 many times), one increases the exploration (the value of Δ grows) in order that the probability of having a beneficial change increases, while whenever the exploitation grows (one obtains | 0 often), one reduces the exploration range to permit only small subsequent modifications.

3.2. Implementation with Quantum Photonics

In Ref. [16], an experiment of the previous proposal described in Section 3.1 was carried out. The experiment aimed to estimate an unknown quantum state in a photonic system, in a scenario with a reduced amount of copies. A partially quantum reinforcement learning protocol was used in order to adapt a qubit state, namely, the “agent”, to the given unknown quantum state, i.e., the “environment”, via iterated single-shot projective measurements followed by feedback, with the aim of achieving maximum fidelity. The experimental setup, consisting of a quantum photonics device, could modify the available parameters to change the agent system according to the measurement results “0” and “1” in the environment state, namely, reward/punishment feedback. The experimental results showed that the proposed protocol provides a technique for estimating an unknown photonic quantum state whenever only a limited amount of copies is given, while it can be extended as well to higher dimensionality, multipartite, and density matrix quantum state cases. The achieved fidelities in the protocol for the single-photon case were over 88% with an appropriate reward/punishment ratio with 50 iterations.

4. Further Developments of Quantum Reinforcement Learning with Quantum Photonics

Without the aim of being exhaustive, here we briefly describe some other works that have appeared in the literature on the field of quantum reinforcement learning with quantum photonics.
In Ref. [19], it was argued that future breakthroughs in experimental quantum science would require dealing with complex quantum systems and therefore require complex and expensive experiments. Designing such complicated experiments is hard and could be enhanced with the aid of artificial intelligence. In this reference, the authors showed an automated learning system which learns to create complex quantum experiments, without considering previous knowledge or, sometimes, incorrect intuition. Their device did not only learn how to create the design of quantum experiments better than in previous works, but it also discovered in the process nontrivial experimental tools. The conclusion they obtained is that learning devices can provide crucial advances in the design and generation of novel experiments.
In Ref. [20], the authors introduced a blueprint for a quantum photonics realization of active learning devices employing machine learning algorithms such as, e.g., SARSA, Q-learning, and projective simulation. They carried out numerical calculations to evaluate the performance of their algorithm with customary reinforcement learning situations, obtaining that reasonable amounts of experimental errors could be allowed or, sometimes, benefit the learning protocol. Among other features, they showed that their designed device would enable features like abstraction and generalization, two aspects considered to be crucial for artificial intelligence. They consider for their studied model a quantum photonics platform, which, they argue, is scalable as well as simple, and proof-of-principle integration in quantum photonics devices seems feasible with near-term platforms.
More specifically, in Ref. [20], the main novelties are two. (i) Firstly, they describe a quantum photonics platform that allows reinforcement learning algorithms for working directly in combination with optical applications. For this aim, they focus on linear optics for its simplicity and well-established fabrication technology when compared to solid-state processors. They mention the example that nanosecond-scale reconfigurability and routing have already been achieved, and furthermore, photonic platforms allow one for decision-making at the speed of light, the fastest possible, which is only constrained by generation and detection rates. Energy efficiency in memories is also a bonus that photonic technologies may provide [20]. (ii) The second achievement of the article is the analysis of a variant of projective simulation based on binary decision trees, which is connected closely to standard projective simulation and suitable for photonic circuit implementations. Finally, they discuss how this development would enable key aspects of artificial intelligence, which are generalization and abstraction [20].

5. Conclusions

In this article, we reviewed the field of quantum reinforcement learning with quantum photonics. Without the goal of being exhaustive, we have firstly reviewed the area of quantum reinforcement learning in general, showing that automated quantum agents can provide sometimes enhancements with respect to classical computers. Later on, we have described a theoretical proposal and its quantum photonics experimental realization of a quantum reinforcement learning protocol for state estimation. This protocol has been shown to have a speedup with respect to standard quantum tomography in the reduced resource scenario. Finally, we have briefly reviewed some other works in the field of quantum reinforcement learning with quantum photonics that appeared in the literature.
The field of quantum reinforcement learning may provide quantum systems with a larger amount of autonomy and independence. Quantum photonics is among the quantum platforms where this kind of technology could be highly fruitful. Even though the amount of qubits is often not as large as in other platforms such as trapped ions and superconducting circuits, quantum photonics processes information at the speed of light, and it can be suitably interfaced with long-distance quantum communication protocols. A long-term scope in quantum reinforcement learning could be to combine this paradigm with quantum artificial life [8]. This will allow one to achieve fully autonomous quantum individuals that can reproduce, evolve, as well as interact and adapt to their environment. Further benefits in areas such as neuroscience could emerge as a consequence of this promising avenue.


This research was funded by PGC2018-095113-B-I00, PID2019-104002GB-C21, and PID2019-104002GB-C22 (MCIU/AEI/FEDER, UE).

Conflicts of Interest

The author declares no conflict of interest.


  1. Russell, S.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: London, UK, 2009. [Google Scholar]
  2. Wittek, P. Quantum Machine Learning; Academic Press: Cambridge, MA, USA, 2014. [Google Scholar]
  3. Schuld, M.; Petruccione, F. Supervised Learning with Quantum Computers; Springer: Berlin/Heidelberg, Germany, 2018. [Google Scholar]
  4. Schuld, M.; Sinayskiy, I.; Petruccione, F. An introduction to quantum machine learning. Contemp. Phys. 2015, 56, 172. [Google Scholar] [CrossRef]
  5. Biamonte, J.; Wittek, P.; Pancotti, N.; Rebentrost, P.; Wiebe, N.; Lloyd, S. Quantum machine learning. Nature 2017, 549, 074001. [Google Scholar] [CrossRef] [PubMed]
  6. Dunjko, V.; Briegel, H.J. Machine learning & artificial intelligence in the quantum domain: A review of recent progress. Rep. Prog. Phys. 2018, 81, 074001. [Google Scholar] [PubMed]
  7. Schuld, M.; Sinayskiy, I.; Petruccione, F. The quest for a Quantum Neural Network. Quantum Inf. Process. 2014, 13, 2567. [Google Scholar] [CrossRef]
  8. Lamata, L. Quantum machine learning and quantum biomimetics: A perspective. Mach. Learn. Sci. Technol. 2020, 1, 033002. [Google Scholar] [CrossRef]
  9. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
  10. Dong, D.; Chen, C.; Li, H.; Tarn, T.-J. Quantum Reinforcement Learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 38, 1207. [Google Scholar] [CrossRef]
  11. Paparo, G.D.; Dunjko, V.; Makmal, A.; Martin-Delgado, M.A.; Briegel, H.J. Quantum Speedup for Active Learning Agents. Phys. Rev. X 2014, 4, 031002. [Google Scholar] [CrossRef]
  12. Dunjko, V.; Taylor, J.M.; Briegel, H.J. Quantum-Enhanced Machine Learning. Phys. Rev. Lett. 2016, 117, 130501. [Google Scholar] [CrossRef]
  13. Lamata, L. Basic protocols in quantum reinforcement learning with superconducting circuits. Sci. Rep. 2017, 7, 1609. [Google Scholar] [CrossRef]
  14. Cárdenas-López, F.A.; Lamata, L.; Retamal, J.C.; Solano, E. Multiqubit and multilevel quantum reinforcement learning with quantum technologies. PLoS ONE 2018, 13, e0200455. [Google Scholar] [CrossRef]
  15. Albarrán-Arriagada, F.; Retamal, J.C.; Solano, E.; Lamata, L. Measurement-based adaptation protocol with quantum reinforcement learning. Phys. Rev. A 2018, 98, 042315. [Google Scholar] [CrossRef]
  16. Yu, S.; Albarrán-Arriagada, F.; Retamal, J.C.; Wang, Y.-T.; Liu, W.; Ke, Z.-J.; Meng, Y.; Li, Z.-P.; Tang, J.-S.; Solano, E.; et al. Reconstruction of a Photonic Qubit State with Reinforcement Learning. Adv. Quantum Technol. 2019, 2, 1800074. [Google Scholar] [CrossRef]
  17. Albarrán-Arriagada, F.; Retamal, J.C.; Solano, E.; Lamata, L. Reinforcement learning for semi-autonomous approximate quantum eigensolver. Mach. Learn. Sci. Technol. 2020, 1, 015002. [Google Scholar] [CrossRef]
  18. Olivares-Sánchez, J.; Casanova, J.; Solano, E.; Lamata, L. Measurement-Based Adaptation Protocol with Quantum Reinforcement Learning in a Rigetti Quantum Computer. Quantum Rep. 2020, 2, 293–304. [Google Scholar] [CrossRef]
  19. Melnikov, A.A.; Nautrup, H.P.; Krenn, M.; Dunjko, V.; Tiersch, M.; Zeilinger, A.; Briegel, H.J. Active learning machine learns to create new quantum experiments. Proc. Natl. Acad. Sci. USA 2018, 115, 1221. [Google Scholar] [CrossRef]
  20. Flamini, F.; Hamann, A.; Jerbi, S.; Trenkwalder, L.M.; Nautrup, H.P.; Briegel, H.J. Photonic architecture for reinforcement learning. New J. Phys. 2020, 22, 045002. [Google Scholar] [CrossRef]
  21. Fösel, T.; Tighineanu, P.; Weiss, T.; Marquardt, F. Reinforcement Learning with Neural Networks for Quantum Feedback. Phys. Rev. X 2018, 8, 031084. [Google Scholar] [CrossRef]
  22. Bukov, M. Reinforcement learning for autonomous preparation of Floquet-engineered states: Inverting the quantum Kapitza oscillator. Phys. Rev. B 2018, 98, 224305. [Google Scholar] [CrossRef]
  23. Bukov, M.; Day, A.G.R.; Sels, D.; Weinberg, P.; Polkovnikov, A.; Mehta, P. Reinforcement Learning in Different Phases of Quantum Control. Phys. Rev. X 2018, 8, 031086. [Google Scholar] [CrossRef]
  24. Melnikov, A.A.; Sekatski, P.; Sangouard, N. Setting up experimental Bell test with reinforcement learning. arXiv 2020, arXiv:2005.01697. [Google Scholar]
  25. Mackeprang, J.; Dasari, D.B.R.; Wrachtrup, J. A Reinforcement Learning approach for Quantum State Engineering. arXiv 2019, arXiv:1908.05981. [Google Scholar]
  26. Schäfer, F.; Kloc, M.; Bruder, C.; Lörch, N. A differentiable programming method for quantum control. arXiv 2002, arXiv:2002.08376. [Google Scholar] [CrossRef]
  27. Sgroi, P.; Palma, G.M.; Paternostro, M. Reinforcement learning approach to non-equilibrium quantum thermodynamics. arXiv 2020, arXiv:2004.07770. [Google Scholar]
  28. Wallnöfer, J.; Melnikov, A.A.; Dür, W.; Briegel, H.J. Machine learning for long-distance quantum communication. arXiv 2019, arXiv:1904.10797. [Google Scholar]
  29. Zhang, X.-M.; Wei, Z.; Asad, R.; Yang, X.-C.; Wang, X. When does reinforcement learning stand out in quantum control? A comparative study on state preparation. npj Quantum Inf. 2019, 5, 85. [Google Scholar] [CrossRef]
  30. Xu, H.; Li, J.; Liu, L.; Wang, Y.; Yuan, H.; Wang, X. Generalizable control for quantum parameter estimation through reinforcement learning. npj Quantum Inf. 2019, 5, 82. [Google Scholar] [CrossRef]
  31. Sweke, R.; Kesselring, M.S.; van Nieuwenburg, E.P.L.; Eisert, J. Reinforcement Learning Decoders for Fault-Tolerant Quantum Computation. arXiv 2018, arXiv:1810.07207. [Google Scholar] [CrossRef]
  32. Andreasson, P.; Johansson, J.; Liljestr, S.; Granath, M. Quantum error correction for the toric code using deep reinforcement learning. Quantum 2019, 3, 183. [Google Scholar] [CrossRef]
  33. Nautrup, H.P.; Delfosse, N.; Dunjko, V.; Briegel, H.J.; Friis, N. Optimizing Quantum Error Correction Codes with Reinforcement Learning. Quantum 2019, 3, 215. [Google Scholar] [CrossRef]
  34. Fitzek, D.; Eliasson, M.; Kockum, A.F.; Granath, M. Deep Q-learning decoder for depolarizing noise on the toric code. Phys. Rev. Res. 2020, 2, 023230. [Google Scholar] [CrossRef]
  35. Fösel, T.; Krastanov, S.; Marquardt, F.; Jiang, L. Efficient cavity control with SNAP gates. arXiv 2020, arXiv:2004.14256. [Google Scholar]
  36. McKiernan, K.A.; Davis, E.; Alam, M.S.; Rigetti, C. Automated quantum programming via reinforcement learning for combinatorial optimization. arXiv 2019, arXiv:1908.08054. [Google Scholar]
  37. Garcia-Saez, A.; Riu, J. Quantum Observables for continuous control of the Quantum Approximate Optimization Algorithm via Reinforcement Learning. arXiv 2019, arXiv:1911.09682. [Google Scholar]
  38. Khairy, K.; Shaydulin, R.; Cincio, L.; Alexeev, Y.; Balaprakash, P. Learning to Optimize Variational Quantum Circuits to Solve Combinatorial Problems. arXiv 2019, arXiv:1911.11071. [Google Scholar] [CrossRef]
  39. Yao, J.; Bukov, M.; Lin, L. Policy Gradient based Quantum Approximate Optimization Algorithm. arXiv 2020, arXiv:2002.01068. [Google Scholar]
  40. Flamini, F.; Spagnolo, N.; Sciarrino, F. Photonic quantum information processing: A review. Rep. Prog. Phys. 2019, 82, 016001. [Google Scholar] [CrossRef]
  41. Cai, X.-D.; Wu, D.; Su, Z.-E.; Chen, M.-C.; Wang, X.-L.; Li, L.; Liu, N.-L.; Lu, C.-Y.; Pan, J.-W. Entanglement-Based Machine Learning on a Quantum Computer. Phys. Rev. Lett. 2015, 114, 110504. [Google Scholar] [CrossRef]
  42. Briegel, H.J.; De las Cuevas, G. Projective simulation for artificial intelligence. Sci. Rep. 2012, 2, 1. [Google Scholar] [CrossRef]
Figure 1. Reinforcement learning protocol. A system, called agent, interacts with its external world, the environment, carrying out some action on it, while receiving information from it. Afterwards, the agent acts accordingly in order to achieve some long-term goal, via feedback with rewards, iterating the process several times.
Figure 1. Reinforcement learning protocol. A system, called agent, interacts with its external world, the environment, carrying out some action on it, while receiving information from it. Afterwards, the agent acts accordingly in order to achieve some long-term goal, via feedback with rewards, iterating the process several times.
Photonics 08 00033 g001


Prof. Lucas Lamata is Associate Professor of Theoretical Physics at the Departamento de Física Atómica, Molecular y Nuclear, Facultad de Física, Universidad de Sevilla, Spain. He carried out his PhD at CSIC, Madrid, and Universidad Autónoma de Madrid, in 2007, with an Extraordinary Award for a PhD in Physics. Later on, he was Humboldt Fellow and Max-Planck Postdoctoral Fellow for more than three years at the Max-Planck Institute of Quantum Optics, Garching, Germany. Subsequently, he was a Marie Curie IEF Postdoctoral Fellow, Ramón y Cajal Fellow, and tenured scientist, at Universidad del País Vasco, Bilbao, Spain, for more than eight years. Since 2019, he achieved his tenured Associate Professor position at Universidad de Sevilla, where he leads his group on quantum optics, quantum technologies, and quantum artificial intelligence.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Back to TopTop