You are currently viewing a new version of our website. To view the old version click .
Future Internet
  • Article
  • Open Access

22 November 2024

QRLIT: Quantum Reinforcement Learning for Database Index Tuning

,
,
and
1
Institute of Engineering of Coimbra—ISEC, Polytechnic University of Coimbra, Rua da Misericórdia, Lagar dos Cortiços, S. Martinho do Bispo, 3045-093 Coimbra, Portugal
2
School of Computer Science, The University of Oklahoma, Norman, OK 73019, USA
3
IRISA, CNRS, University of Rennes, Rue de Kerampont, 22305 Lannion Cedex, France
*
Author to whom correspondence should be addressed.

Abstract

Selecting indexes capable of reducing the cost of query processing in database systems is a challenging task, especially in large-scale applications. Quantum computing has been investigated with promising results in areas related to database management, such as query optimization, transaction scheduling, and index tuning. Promising results have also been seen when reinforcement learning is applied for database tuning in classical computing. However, there is no existing research with implementation details and experiment results for index tuning that takes advantage of both quantum computing and reinforcement learning. This paper proposes a new algorithm called QRLIT that uses the power of quantum computing and reinforcement learning for database index tuning. Experiments using the database TPC-H benchmark show that QRLIT exhibits superior performance and a faster convergence compared to its classical counterpart.

1. Introduction

Executing queries in relational database applications with large amounts of data can take significant time. Database management systems (DBMSs) offer various mechanisms to reduce query execution time. One such mechanism is the creation and management of indexes. Creating column indexes is a strategy that reduces the time required to search and retrieve data. However, this index selection problem, which is to find an optimal set of indexes (i.e., an optimal index configuration) for given database tables, is an NP-hard problem [1,2]. This problem becomes more complex for large-scale database applications. Furthermore, the necessity for deleting, modifying, and inserting data may occur with considerable frequency, introducing further complexities to the problem. Indexes are managed by the database administrator (DBA), who has the knowledge about the query workload to create an efficient index configuration. As the query workload changes, the DBA must re-evaluate the index configuration. To reduce the burden on the DBA, various algorithms have been proposed to automate the process of tuning database indexes in classical computing. These include algorithms that use supervised machine learning techniques to learn what indexes have been used and how queries have been performed in the past from the given training data and predict what indexes should be created for the new query workload. Since training data are often difficult to obtain, there are index tuning algorithms that make use of reinforcement learning, which does not depend on training data and learns as it goes [3,4,5,6,7,8,9,10,11,12,13,14,15,16].
Quantum computing is an emerging technology that transforms the way information is processed, offering significant potential advantages over classical systems enabled by the quantum theory principles such as superposition and entanglement. This has been verified by Shor’s algorithm [17], capable of factoring prime numbers in polynomial time and by a quantum search algorithm proposed by Lov Grover [18], with the ability to perform searches on unstructured data with complexity O ( N ) , where N is the number of elements in the search space. Additionally, in database management, quantum computing has been investigated with promising results in several areas of database management, including query optimization and transaction scheduling [19], which are two other NP-hard problems.
However, an analysis of the current state of the art reveals that there are no studies that implement quantum reinforcement learning strategies with experimental results in the process of automating index tuning. To address this gap, in this paper, an existing index tuning algorithm for classical computers is implemented and a quantum–classical (hybrid) version, called Quantum Reinforcement Learning for database Index Tuning (QRLIT), that employs the capabilities of Grover’s search is proposed. The primary objective is to compare the performance of the hybrid algorithm against its classical counterpart.
The implemented classical algorithm [3,20] employs a machine learning technique called reinforcement learning [21]. It is composed of two principal elements, a designated agent and environment. The agent learns to make decisions through interactions with the environment using a trial-and-error learning method. The classical index tuning algorithm employs a technique called Epsilon-greedy [21] to balance the agent’s ability to explore or follow its learned policy (exploiting). The proposed hybrid model replaces this technique with Grover’s search algorithm, which enables a probabilistic approach and a natural balancing of the exploring–exploiting duality through the manipulation of the number of iterations.
This paper contributes a novel algorithm that combines quantum computing with reinforcement learning to automate the process of database index tuning. Furthermore, a series of experiments demonstrate the advantages of using quantum computing over traditional systems. The results obtained indicate that QRLIT converges faster to an optimal policy and is able to produce a higher reward in terms of queries processed per hour than its classical counterpart.
The rest of the paper is organized as follows: Section 2 provides some background information. Section 3 presents an overview of the state of the art, and the classical index tuning algorithm with its quantum counterpart implementation is described in Section 4. The experimental environment and results obtained from running both algorithms are detailed in Section 5. Finally, Section 6 concludes the paper and proposes directions for future work.

2. Background

The purpose of this section is to provide the necessary context and foundations to understand the quantum–classical implementation. This background explains reinforcement learning, the quantum computing foundations, and Grover’s quantum search algorithm.

2.1. Reinforcement Learning

In artificial intelligence, reinforcement learning is a branch inspired by the natural process of learning through reinforcement. Entities known as agents learn a policy π that maps states of the environment to actions with the purpose of maximizing the value of accumulated rewards over time in a stochastic environment modeled by a Markov decision process (MDP) [21]. An MDP is defined by a tuple with five elements ( S , A , P , R , γ ) , where S represents the state space, A the action space, P the state transition function defining the dynamics of the MDP, R the reward function, and γ a discount factor with 0 γ 1 [21].
Q-learning is a modal-free algorithm used to solve reinforcement learning problems based on temporal difference (TD) learning methods [21]. These methods involve learning to make optimal decisions directly from experiences without a model of the environment’s dynamics [21]. The core idea behind this algorithm is to learn a tabular policy, known as a Q-table, which stores the values of actions for each state. These values, called Q-values, represent the quality of each action in a specific state. In other words, it refers to how effective that action is in obtaining a good reward. So, the greater the value, the higher the potential reward and the better the action is considered.
As a fundamental step in this algorithm, after the agent executes an action and receives feedback from the environment (reward and new state), it is crucial to update its policy. This process uses Equation (1), where Q ( s , a ) represents the Q-value of action a executed in state s , where α is the learning rate, r is the reward obtained, γ is the discount rate that influences the impact of future rewards, and m a x a Q ( s , a ) represents the Q-value of the action with the highest value in the new state of the environment s .
T D E r r o r = r + γ m a x a Q ( s , a ) Q ( s , a ) Q ( s , a ) Q ( s , a ) + α ( T D E r r o r )
In Q-learning, there is a limitation in balancing exploration and exploitation, as it is important to explore the environment’s states to prevent the agent from getting stuck in a local maximum. To find this balance, the algorithm can use a strategy known as Epsilon-greedy [21]. This strategy uses an exploration rate epsilon ε , which decreases at the end of each episode. Therefore, with this algorithm (Equation (2)), the agent explores with a probability of ε or follows the learned policy (exploits) with a probability of 1 ε .
a = a r g m a x a A ,     w i t h   p r o b a b i l i t y   1 ε r a n d o m a A ,     o t h e r w i s e

2.2. Quantum Computing

Based on the principles of quantum theory, such as superposition and entanglement, quantum computing offers great advantages over classical computing [17,18]. This section is organized into two subsections that introduce and describe the building blocks of quantum computing. It begins with the introduction of the system’s basic units and their mathematical representation in Section 2.2.1. Then, the quantum logic gates are introduced, which are responsible for operations on the information units in Section 2.2.2.

2.2.1. Information Unit

In the field of quantum computing, the fundamental unit of information is a quantum bit, or qubit. Similarly to classical bits, qubits operate in a two-level system, corresponding to states 0 and 1. However, in contrast to bits, which exist in a single state at a time, qubits can be simultaneously in both. This phenomenon, which is paradoxical from the perspective of classical physics, is known as superposition. According to quantum theory, the precise state of a qubit in a superposition can only be identified through an observation or measurement, at which point it will collapse to one of its fundamental states, either 0 or 1, with a certain probability [22].
Mathematically, the state of a qubit is described in Dirac notation as a linear combination of the base states | 0 and | 1 , as illustrated in Equation (3). The complex domain coefficients α and β represent the amplitudes of each state. The base states, designated by the symbols | 0 and | 1 , are described in the expressions presented in Equation (4). The amplitudes of these states are either 0 or 1, depending on the state in question. However, in the case of superposition, the values of α and β can be included within any arbitrary value in a range between [ 0 , 1 ] .
| ψ = α | 0 + β | 1 = α β
0 = 1 0 , 1 = 0 1
A qubit can be represented on a sphere known as a Bloch sphere [22]. In this model, it is evident that the amplitudes of a quantum state are expressed in spherical coordinates, as described in Equation (5).
| ψ = c o s θ 2 | 0 + ( c o s ϕ + i s i n ϕ ) s i n θ 2 | 1
Quantum state amplitudes, also known as probability amplitudes, define the probability of a superposition qubit being observed in the state | 0 or | 1 . The probability of finding the qubit in the state | 0 is calculated using Equation (6), while the probability of finding the qubit in the state | 1 is determined by Equation (7).
P ( | 0 ) = α 2
P ( | 1 ) = | β | 2
α 2 + | β | 2 = 1
In order to allow for the encoding of more complex information in any computing system, it is essential to combine multiple units. In quantum computing, this combination is achieved through the tensor product of qubits. Equation (9) contains the notation for a system of two qubits, while Equation (10) presents the result of their tensor product.
| ψ | ω | ψ | ω | ψ ω | ψ , ω
| ψ | ω = ( α | 0 + β | 1 ) ( γ | 0 + δ | 1 ) = α β γ δ = α × γ α × δ β × γ β × δ = α γ | 00 + α δ | 01 + β γ | 10 + β δ | 11

2.2.2. Quantum Logic Gates

The ability to manipulate and control the amplitudes of the states of qubits is a fundamental prerequisite for the implementation of a quantum computing process. This manipulation is performed through quantum logic gates, or simply quantum gates, which allow for the creation of quantum algorithms [22].
An operation is defined as a matrix that through matrix multiplication transforms one quantum state into another. Equation (11) provides a mathematical demonstration of this process, where U represents the operation in question, | ψ 1 the initial state, and | ψ 2 the resulting state [22].
U | ψ 1 = | ψ 2
A quantum gate that acts on several qubits is described by a matrix of dimensions 2 n × 2 n , where n represents the number of qubits. The most common quantum gates are Pauli-X, Pauli-Z, Hadamard, Controlled NOT (CNOT or CX), and Controlled-Z, which are represented in matrices correspondingly in Equation (13). The Pauli-X gate performs state negation, which is equivalent to a NOT gate in classical computers, and the Pauli-Z gate, also known as a phase-flip gate, transforms the | 1 state into | 1 . The Hadamard gate sets the qubit in superposition, mapping the base state as presented in Equation (12). The Controlled NOT is controlled by the state of a control qubit to perform the negation. In other words, the gate is activated only if the qubit is in state | 1 . In conclusion, the Controlled-Z behaves in the same way as Controlled NOT, but in this case, a phase-flip operation is performed.
H | 0 = | 0 + | 1 2 ,       H | 1 = | 0 | 1 2
0 1 1 0 ,       1 0 0 1 ,       1 2 1 1 1 1 ,       1 0 0 0 0 1 0 0 0 0 0 1 0 0 1 0 , 1 0 0 0 0 1 0 0 0 0 1 0 0 0 0 1

2.3. Grover’s Search Algorithm

Quantum computing can speed up various search processes on unordered data due to the ability to superposition quantum states, thus allowing for the use of quantum parallelism. In 1996, a search algorithm that uses these quantum properties was proposed by Lov Grover [18]. Grover’s algorithm evaluates whether a given solution, called “good state”, is contained in the domain of N possible solutions. By increasing the probability of the “good state” and reducing the probability of the remaining ones, it allows for a search with a time complexity O ( N ) , presenting a great advantage in relation to the classical one with a time complexity O ( N ) .
The algorithm is built with three main layers (Figure 1), each encapsulating a different function. The initial layer, designated as State Preparation, initiates the process by placing all qubits into a superposition state. The Oracle, representing the second layer, encodes the “good state” and changes its signal (phase shift) through a combination between multiple controlled Pauli-Z and Pauli-X gates [23]. Finally, the Amplification Layer, or Diffusion Operator, serves as a third layer and uses a combination of Hadamard, multi-controlled Pauli-X, and Pauli-X gates [23]. Its function is to phase shift again and amplify the probability of obtaining the “good state” during the observation process.
Figure 1. Circuit diagram of Grover’s algorithm layers (based on [24]).
Following Grover’s definition, to achieve the maximum probability of measuring the good state, we need to add more iterations by repeating layers two and three by t times (Equation (14)) for a unique solution [24].
T = i n t ( π 4 N 1 2 ) ,           N = 2 n u m b e r   o f   q u b i t s

4. QRLIT: Quantum–Classical Implementation of Classical SMARTIX Algorithm

This section describes the implementation of our QRLIT algorithm, a hybrid quantum–classical version of SMARTIX. Initially, we present a method for combining quantum computing (QC) with reinforcement learning (RL), which serves as the basis for the development of QRLIT. Then, we provide a description of the process used to identify the components that were converted. Finally, we demonstrate and compute the fundamental values necessary for the construction of the quantum circuit and conclude with the QRLIT flow diagram and pseudocode.
Quantum reinforcement learning (QRL) is a method that combines the capabilities of QC and RL. Similarly to its classical counterpart, quantum reinforcement learning also includes a policy, a state space, an action space, and a reward function but is inspired by the superposition principle and quantum parallelism [33]. Based on the novel algorithm proposed in [33] for QRL, the authors of the paper [34] propose an algorithm called quantum Q-learning (QQRL) that stores the policy in a superposition state and uses Grover’s algorithm as a strategy to amplify the probability amplitude of the best action based on the learned policy. Grover’s algorithm exploits the natural behavior of superposition states and offers a good balance between exploration and exploitation. This balance can be achieved by controlling the number of Grover iterations L through the learning process of the agent. In other words, as the agent learns and the number of iterations increases, the capacity to explore decreases until reaching a number of iterations t (as defined in Equation (14)), which maximizes the probability of measuring the “good” action. The number of iterations L is determined by the formula in Equation (20) from [34], where k represents a rate that controls the proportion of policy and reward contributions and t denotes the maximum number of possible iterations.
L = m i n i n t k r + m a x a Q ( s , a ) , t
Our QRLIT implementation is based on the QQRL algorithm. Therefore, we identified that the Epsilon-greedy procedure is replaced by the Grover search algorithm, keeping the remaining elements in a classical system. With Grover’s algorithm in QRLIT, we can not only determine the actions to be executed in the environment but also naturally balance the agent’s duality between exploration and exploitation. As previously outlined in the Background Section, Grover’s algorithm contains three distinct layers, namely State Preparation, Oracle, and Amplitude Amplification. In the State Preparation layer, we initiate the policy of the agent in a superposition state and in the Oracle, we encode the action with the highest Q-value in the current state of the environment. As the last layer, we implement the Amplitude Amplification, which amplifies the probability amplitude to measure the action encoded in the Oracle.
To run and build Grover’s algorithm, it is crucial to identify how many qubits are required in the quantum register to encode the actions. We calculate the number of qubits by using the formula N a 2 n 2 N a , presented by the authors of the paper [33]. In this formula, N a represents the size of the action space, while n denotes the number of qubits required to encode an action. We apply and solve the formula for a space of 90 actions and round off the excess, so that n is equal to seven (Equation (21)). We then define the maximum number of Grover iterations, t . As the number of qubits is already calculated, N is equal to 128 and therefore, t equals eight (Equation (22)).
2 n = N a n = l o g 2 ( N a ) n = l o g 2 ( 90 ) n 6.491 7
N = 2 n u m b e r   o f   q u b i t s N = 2 7 = 128 t = i n t ( π 4 N 1 2 ) t = i n t ( 128 π 4 1 2 ) t = i n t ( 8.386 ) = 8
As identified before, to create the QRLIT, we replace the Epsilon-greedy strategy in the classical algorithm with Grover’s search (line 6 in Algorithm 1). Figure 2 illustrates the interactions between the principal components of QRLIT. The agent component initiates the first interaction through the execution of Grover’s algorithm, which returns the action a . In binary code, the action is converted to a decimal value and executed in the environment (line 7 in Algorithm 1). The environment then processes the value and enters a new state s . In this new state, the benchmark is prepared by creating the necessary query instances using QGen to run the power and throughput tests. Once the benchmark has been executed, the reward r and the new state s of the environment are returned to the agent. With these two values, the agent calculates the number of Grover iterations L , selects the action of the new state that contains the highest Q-value, and sends these values to the operation that builds the Grover algorithm circuit (line 8 in Algorithm 1). Then, the quantum circuit is constructed with all seven qubits initialized in the register in the state | 0 . Our QRLIT proposal provides a natural balance between exploration and exploitation, allowing for more effective learning; as the agent learns and adjusts its policy, the exploration rate decreases (Equation (20)). Furthermore, given the properties of quantum parallelism and superposition states in Grover’s algorithm, this proposal offers another advantage, namely that it is able to find an action faster (complexity of O N ) (line 6 in Algorithm 1) than its classical counterpart (complexity of O N ).
Algorithm 1 QRLIT algorithm with Grover’s search, function approximation, and experience replay. Adapted from [3,34].
1: Random   initialization   of   parameters   Θ
2:Empty initialization of replay memory D
3:for each episode do
4:      s ← DB initial index configuration mapped as initial state
5:      for each step of episode do
6:             a     Run   Grover   algorithm   on   s
7:             s ,   r   execute ( a )
8:             Build   Grover s   circuit   with   L   and   a r g m a x a A
9:             for   θ i   Θ  do
10:                  Update   θ i according to Equation (19)
11:             end for
12:              Store   experience   e = s ,   a , r , s in D
13:             Sample   random   mini   batch   of   experiences   e ~ D
14:             Performance experience replay using sampled data
15:             s ← s′
16:       end for
17:end for
Figure 2. Flow diagram of QRLIT algorithm.

5. Performance Evaluation

This section presents the experiments conducted on the classical algorithm SMARTIX and its quantum–classical version QRLIT (source code in [35]), as well as the analyses performed on the results. It is organized into three subsections. Section 5.1 provides a detailed description of the environment used to execute the experiments and Section 5.2 and Section 5.3 presents the experiment results and their analysis.

5.1. Experimental Model

All experiments were conducted on a docker container with Ubuntu 22.04 on a 2021 MacBook Pro, which is equipped with 16GB of RAM, 1TB of disk space, and an Apple M1 Pro CPU with 10 cores. MySQL was used as the DBMS, which implements the TPC-H benchmark, while a simulator provided by the Qiskit SDK was used to build and execute the quantum algorithm.
Additionally, in accordance with the TPC-H benchmark specification [27], 22 query instances were executed in the Power metric and 44 in the Throughput metric with two parallel streams (22 queries for each stream). This resulted in a total of 66 query instances being executed in each time step. The queries were generated through a tool provided by the TPC-H benchmark, designated as QGen.
The experiments were carried out according to the parameter settings outlined in Table 1. The first parameter setting corresponds to the tests conducted in Section 5.2 to study the overall performance of the algorithms when the database size is fixed at 10 MB, while the second parameter setting corresponds to Section 5.3 to study the impact of database sizes of 10 MB, 20 MB, 30 MB, 40 MB, 70 MB, and 100 MB on the performance of the algorithms. In this second configuration, the number of episodes was reduced to 25 in order to reduce the time required to execute the experiments.
Table 1. Configuration parameters for the tests.

5.2. Overall Performance

In this section, experiments were conducted to study the overall performance of the two algorithms when the database is fixed at 10 MB. This study is based on the following metrics: number of queries processed per hour, episode execution time, temporal difference error, and number of Grover iterations. The first metric defines the quality of the algorithms in terms of their ability to identify a policy that maximizes the cumulative reward (queries per hour) over time. The episode execution time metric measures the velocity of the algorithms in executing an episode. The temporal difference error (TD Error) (Equation (1)) metric demonstrates the algorithm’s convergence to an optimal policy; in other words, the closer the values are to 0, the better the policy is. Finally, the Grover iterations metric measures the relation between exploration and exploitation; the lower the value, the higher the rate of exploration relative to exploitation. This metric allows for the analysis of the agent’s exploration capacity, which is directly correlated with the number of iterations.
The results obtained for each metric in each of 50 episodes for the two algorithms are shown in Figure 3, Figure 4, Figure 5 and Figure 6. The average results of each metric over 50 episodes of the two algorithms are summarized in Table 2. The analysis of Figure 3 and Table 2 reveals that on average, the hybrid algorithm exhibits a higher number of queries processed per hour by 0.61% compared to its classical counterpart.
Figure 3. Number of queries processed per hour in each episode.
Figure 4. Average temporal difference error in each episode.
Figure 5. Grover iterations in each episode with hybrid algorithm.
Figure 6. Total execution time in each episode.
Table 2. Comparison results of the average results of the classical and quantum–classical algorithms for a database size of 10 MB.
Moreover, from the analysis of Figure 4, the hybrid algorithm has a much faster convergence to a low temporal difference error showing a more stable learning, displaying a temporal difference error trajectory closer to 0 (Table 2).
To find a balance between exploring and exploiting, the classical algorithm implements a strategy known as Epsilon-greedy. This strategy uses an exploration rate epsilon ε = 0.9 , which decreases with an exploration discount factor of 0.1 at the end of each episode. Therefore, with this algorithm, the agent explores with a probability of ε or follows the learned policy (exploits) with a probability of 1 ε . In the case of the quantum–classical algorithm, as the agent learns and adjusts its policy, the number of Grover iterations also increases, consequently reducing the exploration probability (Equation (20)) (Figure 5).
The quantum–classical algorithm provides a better index recommendation, resulting in a higher number of queries processed per hour than the classical algorithm because as the agent of the quantum–classical algorithm refines its policy through learning, the exploration rate decreases. This leads to a decrease in unnecessary explorations and allows for more effective learning. However, the quantum–classical algorithm takes on average 21.67% more time to complete an episode than its classical counterpart (Figure 6). This discrepancy is related to the additional computational overhead to create and execute the quantum circuit at each time step. Figure 7 shows the time required to build and execute Grover’s algorithm.
Figure 7. The average execution times to build and run the Grover algorithm for each Grover iteration in 20 executions with the quantum–classical algorithm.

5.3. Impact of Database Sizes

The purpose of this section is to examine the behavior of the algorithms across a range of database sizes, specifically 10 MB, 20 MB, 30 MB, 40 MB, 70 MB, and 100 MB. The metrics employed in this analysis include the average number of queries processed per hour of the 25 episodes for each database size, the number of Grover iterations, and the temporal difference error of each algorithm.
The results obtained for each metric in each database size for the two algorithms are shown in Figure 8, Figure 9, Figure 10, Figure 11 and Figure 12. The average results of each metric over the database sizes of the two algorithms are summarized in Table 3. The analysis for the database sizes also indicates a superiority of the hybrid algorithm. The results in Table 3 and Figure 8 show that on average, the hybrid algorithm yields a higher number of queries processed per hour of 2.49% compared to its classical counterpart and displays a temporal difference error trajectory closer to 0 (Figure 9). This trajectory is more evident in this analysis because the number of episodes is reduced by half, highlighting the importance of a faster convergence.
Figure 8. Impacts of database size on average number of queries processed per hour.
Figure 9. Impacts of database size on average temporal difference error.
Figure 10. Impacts of database size on average number of Grover iterations in quantum–classical algorithm.
Figure 11. Average probability distribution of actions in 1024 shots for each Grover iteration in 10 executions for quantum–classical algorithm.
Figure 12. Impact of database size on total execution time.
Table 3. Comparison results of the classical algorithm and the quantum–classical algorithm with different database sizes.
Furthermore, it can also be observed that the number of queries processed per hour decreases as the database size increases. This indicates that the size of the database affects the number of QphH generated, in other words, the reward. Consequently, according to Equation (20), which calculates the number of Grover’s iterations, and since the Q-values are directly related to the reward, they will also have smaller values. Thus, the smaller the policy and reward contribution, the smaller the number of iterations (Figure 10), which increases the exploration rate (Figure 11). Excessive exploration causes the agent not to follow the learned policy, resulting in a mostly random configuration of indexes as the database size increases.
In conclusion, besides the average superiority verified by the quantum–classical algorithm, the results in Figure 10 also demonstrate the need to adjust the parameter k , which regulates the reward and policy contributions to the number of Grover’s iterations. In this case, as the reward value decreases, it is necessary to increase the value of k to reduce the exploration rate.

6. Conclusions and Future Work

This work presents the implementation of QRLIT, a hybrid quantum–classical version of SMARTIX [3]. The QRLIT demonstrated better performance than its classical counterpart in terms of the number of queries processed per hour and a faster convergence to an optimal policy. By controlling the Grover iterations through the reward and the agent policy, as the agent refines its policy through learning, the exploration rate decreases, allowing for a superior temporal difference error convergence closer to zero with more effective learning compared to its classical counterpart. However, as the value of k controls the contribution of reward and policy to the number of Grover iterations, the increase in database size reveals the necessity to adjust this parameter manually to balance the exploration rate. This manual adjustment in an automatic system is a limitation because the reward (QphH) varies not only according to the size of the database but also according to the quality and capacity of the machine’s hardware.
As future work, we intend to analyze the behavior of the algorithms in databases with significant sizes and more queries. It would also be important to investigate their performance on distributed database systems. Finally, evaluating the execution of the quantum–classical algorithm on a real quantum computer is another direction for future research.

Author Contributions

Conceptualization, L.G., J.B. and L.D.; Methodology, L.G., J.B. and L.D.; Software, D.B.; Validation, D.B.; Investigation, D.B.; Resources, D.B.; Writing—original draft, D.B.; Writing—review & editing, L.G., J.B. and L.D.; Visualization, J.B.; Supervision, L.G., J.B. and L.D.; Project administration, L.G., J.B. and L.D.; Funding acquisition, J.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported in part by the National Science Foundation under grant no. 2425838.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Consens, M.P.; Ioannidou, K.; LeFevre, J. Polyzotis. Divergent physical design tuning for replicated databases. In Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data, Scottsdale, AZ, USA, 20–24 May 2012; pp. 49–60. [Google Scholar]
  2. Gruenwald, L.; Winker, T.; Çalikyilmaz, U.; Groppe, J.; Groppe, S. Index Tuning with Machine Learning on Quantum Computers for Large-Scale Database Applications. In Proceedings of the 49th International Conference on Very Large Data Bases (VLDB 2023), Vancouver, BC, Canada, 28 August–1 September 2023. [Google Scholar]
  3. Licks, G.P.; Couto, J.C.; Miehe, P.d.F.; de Paris, R.; Ruiz, D.D.; Meneguzzi, F. SmartIX: A database indexing agent based on reinforcement learning. Appl. Intell. 2020, 50, 2575–2588. [Google Scholar] [CrossRef]
  4. Basu, D.; Lin, Q.; Chen, W.; Vo, H.T.; Yuan, Z.; Senellart, P.; Bressan, S. Cost-Model Oblivious Database Tuning with Reinforcement Learning. In Database and Expert Systems Applications; Springer International Publishing: Cham, Switzerland, 2015; pp. 253–268. [Google Scholar]
  5. Sharma, A.; Schuhknecht, F.M.; Dittrich, J. The Case for Automatic Database Administration using Deep Reinforcement Learning. arXiv 2018, arXiv:1801.05643. [Google Scholar]
  6. Lan, H.; Bao, Z.; Peng, Y. An Index Advisor Using Deep Reinforcement Learning. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, New York, NY, USA, 19–23 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 2105–2108. [Google Scholar]
  7. Sadri, Z.; Gruenwald, L.; Lead, E. DRLindex: Deep reinforcement learning index advisor for a cluster database. In Proceedings of the 24th Symposium on International Database Engineering & Applications, New York, NY, USA, 12–18 August 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 1–8. [Google Scholar]
  8. Sharma, V.; Dyreson, C.; Flann, N. MANTIS: Multiple Type and Attribute Index Selection using Deep Reinforcement Learning. In Proceedings of the 25th International Database Engineering & Applications Symposium, New York, NY, USA, 14–16 July 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 56–64. [Google Scholar]
  9. Yan, Y.; Yao, S.; Wang, H.; Gao, M. Index selection for NoSQL database with deep reinforcement learning. Inf. Sci. 2021, 561, 20–30. [Google Scholar] [CrossRef]
  10. Welborn, J.; Schaarschmidt, M.; Yoneki, E. Learning Index Selection with Structured Action Spaces. arXiv 2019, arXiv:1909.07440. [Google Scholar]
  11. Kossmann, J.; Kastius, A.; Schlosser, R. SWIRL: Selection of Workload-aware Indexes using Reinforcement Learning. In Proceedings of the International Conference on Extending Database Technology, Edinburgh, UK, 29 March–1 April 2022; pp. 2–155. [Google Scholar]
  12. Wu, W.; Wang, C.; Siddiqui, T.; Wang, J.; Narasayya, V.; Chaudhuri, S.; Bernstein, P.A. Budget-aware Index Tuning with Reinforcement Learning. In Proceedings of the 2022 International Conference on Management of Data, New York, NY, USA, 12–17 June 2022; Association for Computing Machinery: New York, NY, USA, 2022; pp. 1528–1541. [Google Scholar]
  13. Zhou, X.; Liu, L.; Li, W.; Jin, L.; Li, S.; Wang, T.; Feng, J. AutoIndex: An Incremental Index Management System for Dynamic Workloads. In Proceedings of the 2022 IEEE 38th International Conference on Data Engineering (ICDE), Kuala Lumpur, Malaysia, 9–12 May 2022; pp. 2196–2208. [Google Scholar]
  14. Lai, S.; Wu, X.; Wang, S.; Peng, Y.; Peng, Z. Learning an Index Advisor with Deep Reinforcement Learning. In Web and Big Data; Springer International Publishing: Berlin/Heidelberg, Germany, 2021; pp. 178–185. [Google Scholar]
  15. Perera, R.M.; Oetomo, B.; Rubinstein, B.I.P.; Borovica-Gajic, R. DBA bandits: Self-driving index tuning under ad-hoc, analytical workloads with safety guarantees. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; pp. 600–611. [Google Scholar]
  16. Perera, R.M.; Oetomo, B.; Rubinstein, B.I.P.; Borovica-Gajic, R. HMAB: Self-driving hierarchy of bandits for integrated physical database design tuning, Proc. VLDB Endow. 2022, 16, 216–229. [Google Scholar] [CrossRef]
  17. Shor, P. Algorithms for quantum computation: Discrete logarithms and factoring. In Proceedings of the 35th Annual Symposium on Foundations of Computer Science, Santa Fe, NM, USA, 20–22 November 1994; pp. 124–134. [Google Scholar]
  18. Grover, L.K. Quantum Mechanics Helps in Searching for a Needle in a Haystack. Phys. Rev. Lett. 1997, 79, 325–328. [Google Scholar] [CrossRef]
  19. Çalikyilmaz, U.; Groppe, S.; Groppe, J.; Winker, T.; Prestel, S.; Shagieva, F.; Arya, D.; Preis, F.; Gruenwald, L. Opportunities for Quantum Acceleration of Databases: Optimization of Queries and Transaction Schedules. Proc. VLDB Endow. 2023, 16, 2344–2353. [Google Scholar] [CrossRef]
  20. Matczak, M.; Czochański, T. Intelligent Index Tuning Using Reinforcement Learning. In New Trends in Database and Information Systems; Springer Nature: Cham, Switzerland, 2023; pp. 523–534. [Google Scholar]
  21. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction, 2018. 2020. Available online: http://incompleteideas.net/book/RLbook2020.pdf (accessed on 18 November 2024).
  22. Groppe, S. Quantum Computing. Available online: https://www.ifis.uni-luebeck.de/~groppe/lectures/qc (accessed on 18 November 2024).
  23. IBM. Tutorials: Grover’s Algorithm. Available online: https://learning.quantum.ibm.com/tutorial/grovers-algorithm (accessed on 18 November 2024).
  24. IBM. IBM Quantum Learning: Grover’s Algorithm. Available online: https://learning.quantum.ibm.com/course/fundamentals-of-quantum-algorithms/grovers-algorithm (accessed on 18 November 2024).
  25. Wu, Y.; Zhou, X.; Zhang, Y.; Li, G. Automatic Index Tuning: A Survey. IEEE Trans. Knowl. Data Eng. 2024, 36, 7657–7676. [Google Scholar] [CrossRef]
  26. TPC. Transaction Performance Council Website. 1998. Available online: https://www.tpc.org/ (accessed on 18 November 2024).
  27. TPC. TPC-H Specifications. 2022. Available online: https://www.tpc.org/TPC_Documents_Current_Versions/pdf/TPC-H_v3.0.1.pdf (accessed on 18 November 2024).
  28. Sadri, Z.; Gruenwald, L. A Divergent Index Advisor Using Deep Reinforcement Learning. In Database and Expert Systems Applications; Springer International Publishing: Cham, Switzerland, 2022; pp. 139–152. [Google Scholar]
  29. Trummer, I.; Venturelli, D. Leveraging Quantum Computing for Database Index Selection. In Proceedings of the 1st Workshop on Quantum Computing and Quantum-Inspired Technology for Data-Intensive Systems and Applications, New York, NY, USA, 9–15 June 2024; Association for Computing Machinery: New York, NY, USA, 2024; pp. 14–26. [Google Scholar]
  30. Kesarwani, M.; Haritsa, J.R. Index Advisors on Quantum Platforms. Proc. VLDB Endow. 2024, 17, 3615–3628. [Google Scholar] [CrossRef]
  31. Matczak, M.; Czochański, T. Source Code: Intelligent Index Tuning Using Reinforcement Learning. Available online: https://github.com/Chotom/rl-db-indexing. (accessed on 18 November 2024).
  32. Melo, F.S.; Ribeiro, M.I. Q-Learning with Linear Function Approximation. In Learning Theory; Springer: Berlin/Heidelberg, Germany, 2007; pp. 308–322. [Google Scholar]
  33. Dong, D.; Chen, C.; Li, H.; Tarn, T.-J. Quantum Reinforcement Learning. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2008, 38, 1207–1220. [Google Scholar] [CrossRef] [PubMed]
  34. Ganger, M.; Hu, W. Quantum Multiple Q-Learning. Int. J. Intell. Sci. 2019, 9, 89926. [Google Scholar] [CrossRef]
  35. Barbosa, D.; Gruenwald, L.; d’Orazio, L.; Bernardino, J. Source Code: QRLIT. 2024. Available online: https://github.com/DBarbosaDev/QRLIT. (accessed on 18 November 2024).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.