Dynamic Cost Ant Colony Algorithm to Optimize Query for Distributed Database Based on Quantum-Inspired Approach

: A distributed database model can be effectively optimized through using query optimization. In such a model, the optimizer attempts to identify the most efﬁcient join order, which minimizes the overall cost of the query plan. Successful query processing largely relies on the methodology implemented by the query optimizer. Many researches are concerned with the fact that query processing is considered an NP-hard problem especially when the query becomes bigger. Regarding large queries, it has been found that heuristic methods cannot cover all search spaces and may lead to falling in a local minimum. This paper examines how quantum-inspired ant colony algorithm, a hybrid strategy of probabilistic algorithms, can be devised to improve the cost of query joins in distributed databases. Quantum computing has the ability to diversify and expand, and thus covering large query search spaces. This enables the selection of the best trails, which speeds up convergence and helps avoid falling into a local optimum. With such a strategy, the algorithm aims to identify an optimal join order to reduce the total execution time. Experimental results show that the proposed quantum-inspired ant colony offers a faster convergence with better outcome when compared with the classic model.


Introduction
A distributed database is a group of interrelated entities that are physically distributed over the network to improve the computer performance, reliability, availability, and modularity of the distributed systems [1]. Optimizing query in databases, centralized or distributed, continues to be an important issue and main problem in commercial and academic fields for quite a long period of time [2]. Many approaches have been discussed earlier on query optimization that uses different technologies, but suffer from the problem of dimension and accuracy [3,4]. The importance for optimization arises from the flexibility provided by modern user interfaces to databases that help the users to easily specify queries effectively. The purpose of the optimizer, in this case, is to determine the best query execution plan (QEP) from many equivalent QEPs that will reduce the execution cost with less time complexity and utilize the minimum resources [5]. With large number of entities (large queries), the number of equivalent QEPs increase exponentially and the optimizer cannot explore all the query plans in such a huge search space. In this case, the selection of the best QEP, by applying a search strategy, is classified as an NP-hard optimization problem [6,7].
The search strategy typically falls into one of three categories-exhaustive search, heuristic-based, or randomized [2,4]. Exhaustive Search algorithms have exponential worst-case running time and exponential space complexity, which can lead to an algorithm requiring an infeasible amount of time to optimize large user queries [5]. Since exhaustive algorithms enumerate over the entire search space, the algorithm will always find the optimal plan based upon the given cost model. The traditional dynamic programming (DP) enumeration algorithm is a popular exhaustive search algorithm, which has been used in a number of commercial database management systems [8].
Heuristic-based algorithms were proposed with the intention of addressing the exponential running time problem of exhaustive enumeration algorithms. Heuristic-based algorithms follow a particular heuristic or rule in order to guide the search into a subset of the entire search space [4]. Typically, these algorithms have polynomial worst-case running time and space complexity but the quality of the plans obtained can be orders of magnitude worse than the best possible plan. Iterative dynamic programming (IDP) is an example for a heuristic-based algorithm [9].
Randomized algorithms consider the search space as a set of points each of which correspond to a unique QEP [2]. A set of moves M is defined as a means to transform one point in space into another i.e., a move allows the algorithm to jump from a point in space to another. If a point p can be reached from a point q using a move m ∈ M, then we say that an edge exists between p and q. Randomized-based models and algorithms are applied with success to several optimization issues. Simulated annealing (SA), iterative improvement (II), and genetic algorithm (GA) have been suggested to optimize large scale recursive queries [10][11][12].
The ant colony optimization (ACO) algorithm is an apt and effective solution for optimizing query in a distributed database because of its features and characteristics, including its robustness, global optimization, parallelism obtained from the ability to act concurrently and independently, and capability to integrate with other methods [3]. To utilize the ACO algorithm for addressing the issue associated with query optimization, the issue will be described as a graph. In this case, the graph symbolized as G = (N, E). The parameters N and E formulate the number of entities (tables) and the relations (edges) between these entities. The edges that link the nodes together on the graph G represent the join relations among entities. In such a case, the purpose of the query optimizer would be to seek out the best Hamiltonian path for G.
The most significant advantage of quantum computing is its ability to potently resolve specific issues faster and more efficiently than classical computing, such as problems with a high computational cost [13]. Quantum-inspired exploration procedures employ the ability of parallel processing by adopting the superposition principle to overcome the limitation of the classical mechanism and to fulfil a higher performance [14]. It is noteworthy that superposition is the aptitude of a quantum system to be in numerous positions (states) simultaneously while waiting for measuring. It is often customary to employ such ability of carrying out parallel processing to solve issues that require the exploration of huge solutions spaces. This paper is a substantial extension of our conference paper [15]. Compared with this small version, further details of the suggested method are presented, and more extensive performance evaluation is conducted. In addition, this paper gives a more comprehensive literature review to introduce the background of the offered method and make the paper more self-contained. Therefore, this version of the paper provides a more comprehensive and systematic report of the previous work. This paper investigates how the quantuminspired ant colony optimization (QIACO) algorithm can be used to overcome the problem of join query optimization in distributed databases when it comes to search spaces where entities (tables) are not replicated and depends on total time for explaining the cost model. Because processing of the queries is considered an NP-hard problem, current traditional approaches, especially when the size of the database increases, suffer from large computational cost, non-convergence to a global optimum and premature convergence. To solve the problems in traditional methods, quantum-inspired ant colony (QIACO) paradigm is used in try to reach the optimum query optimization. Here, quantum-inspired is employed to change the seeking procedure used by the classical ant colony algorithm to move from one node to another. Instead of using the probabilistic mechanism while building the ant solution, our algorithm will use the quantum partial negation gate, controlled by pheromone values, to control the ant movement. Our model was tested using a synthetic data set and modified TPC-H benchmark queries. This paradigm is able to improve the slow convergence speed and avoid falling into the local optimum. The result shows that our model behaves better than the classical model, especially for the queries that contained many entities.
The rest of this paper is divided into four sections. Section 2 formulates the ACO algorithm and describe the basics of quantum computing used in the proposed algorithm. Section 3 reviews the previous related works. Section 4 presents the four algorithms for query optimization. Section 5 describes the experimental result that evaluates the algorithm. The paper is concluded with Future Works and Conclusions sections.

Ant Colony Optimization
Ant colony optimization (ACO) is one of many various approaches of swarm intelligence, a field wherein specialists consider the collective behavioral of insects as an inspiration to mimic approaches. At first, ACO was utilized to find a solution for the traveling salesman and quadratic assignment problems [16]. Ants solve their issues by traversing the graph that represents the problem and leaving behind pheromone to lead the remaining ants. Pheromone trails are used to give the ants a chance to cooperate and benefit from the experiences of other ants by providing positive feedback. On the contrary, negative feedback, which is represented by pheromone evaporation, will need to avoid doldrums. The first ACO algorithm, called the ant system, is utilized to solve TSP [16]. Many different ACO techniques and algorithms have been created and suggested since then. One of the main features of ACO is that, at each iteration, the pheromone values raised by every ant are modified by the ants at the same site that provide a solution. The pheromone τ ij , attached to the edge joining nodes i and j, is updated as follows [16]: where ρ is the evaporation rate for pheromone quantity ∆τ k ij placed on edge (i, j) by ant k from m ants: where Q is a constant and L k is the length of the round created by ant k. When building the solution, the ants choose the next node that should be visited according to a randomized mechanism. When ant k is in node i and has so far constructed the partial solution s p , the probability of going to node j is given by: where N(s p ) is the set of feasible nodes. The relative significance of the pheromone in contrast to the heuristic information η ij , controlled by the parameters α and β and obtained using distance d ij by: where d ij formulates a distance or cost from nodes i to the connected node j.

Quantum-Inspired Evolutionary Algorithms
Quantum-inspired evolutionary algorithms (QIEA) are population based meta-heuristics that draw inspiration from quantum mechanical principles to improve the efficiency Symmetry 2021, 13, 70 4 of 20 and the search for evolutionary optimization algorithms. The possibility feature of parallelism given by quantum computing and simultaneous evaluation of all possible represented states, drive to the improvement of models which integrate some feature of quantum computing with evolutionary computation [17,18]. These models are prepared to execute on classical computers, not on quantum computers, and categorized as "quantum inspired". One of the early attempts was made by Han and Kim [18] who prepared a general model of QIEA. Rather than binary, numeric, and symbolic representations that exist in a classical computer, QIEA uses Q-bit to represent the smallest unit of data. A qubit may be in the "1" state, in the "0" state, or in any superposition between "1" and "0". The qubit's state is described mathematically as [19]: where a and b are complex numbers that give the probability amplitudes for each corresponding state [19]. The probability that the qubit will exist in the state |0 obtained by |a| 2 and the probability that the qubit will exist in the state |1 obtained by |b| 2 [19]. Normalization of the state to unity insure: The qubit's state can be modified by an operation called a quantum gate. A quantum gate is a reversible gate and can be represented as a unitary operator U acting on the qubit basis states satisfying U † U = UU † , where U † is the complex conjugate transpose of U [19]. There are many quantum gates, like the NOT gate, rotation gate, Hadamard gate, etc. [20]. Figure 1 shows the overall structure of QIEA where Q(t) is the Q-bit representation for the individuals in the search population at time t, P(t) is the solution acquired by measuring the states of Q(t), and B(t) is the best solution at time t. More details regarding the complete steps of QIEA can be found in [18].

Quantum-Inspired Evolutionary Algorithms
Quantum-inspired evolutionary algorithms (QIEA) are population based meta-heu ristics that draw inspiration from quantum mechanical principles to improve the eff ciency and the search for evolutionary optimization algorithms. The possibility feature o parallelism given by quantum computing and simultaneous evaluation of all possible rep resented states, drive to the improvement of models which integrate some feature of quan tum computing with evolutionary computation [17] [18]. These models are prepared t execute on classical computers, not on quantum computers, and categorized as "quantum inspired". One of the early attempts was made by Han and Kim [18] who prepared a gen eral model of QIEA. Rather than binary, numeric, and symbolic representations that exis in a classical computer, QIEA uses Q-bit to represent the smallest unit of data. A qubi may be in the "1" state, in the "0" state, or in any superposition between "1" and "0". Th qubit's state is described mathematically as [19]: where and are complex numbers that give the probability amplitudes for each cor responding state [19]. The probability that the qubit will exist in the state 0 obtaine by | | and the probability that the qubit will exist in the state 1 obtained by | | [19 Normalization of the state to unity insure: The qubit's state can be modified by an operation called a quantum gate. A quantum gate is a reversible gate and can be represented as a unitary operator acting on the qubi basis states satisfying = , where is the complex conjugate transpose of [19]. There are many quantum gates, like the NOT gate, rotation gate, Hadamard gate, etc [20]. Figure 1 shows the overall structure of QIEA where Q(t) is the Q-bit representatio for the individuals in the search population at time t, P(t) is the solution acquired by meas uring the states of Q(t), and B(t) is the best solution at time t. More details regarding th complete steps of QIEA can be found in [18].

Problem Definition
A distribution allocation scheme is used in distributed databases to dispense data, which may be propagated at different locations. The objective of the query optimizer, in this case, is to provide an execution plan (from different equivalent plans) that helps in reducing the cost of query execution that is based on either response time or total time, to a minimal. The solution space for query that contains many entities and, in the same time, many database locations will grow exponentially. Searching for the best query execution plan, in this case, becomes computationally difficult and classified as an NP-hard optimization problem. Here, finding the best execution plan depends on the search strategy that will be used to explore the solution space.
The query search space can be represented as a graph G = (N, E) with a set of vertices N, and represents the set of entities in the query, and set of edges E, represents the join between entities, such that each edge e ∈ E is assigned a cost C e . Let H be the set of all Hamiltonian cycles, a cycle that visits each vertex exactly once, in G. The optimizer problem is to find the path h ∈ H in G such that the sum of costs C e is minimized. Given a set of entities n enumerated as 0, 1, 2, . . . , n − 1 to be joined with the join cost between entity i and j given by C ij . We introduce a decision variable y for each (i, j) such that: The objective function in this case is: Here, this objective function will be guided by two parameters, the number of entities n and the join cost C which is affected by the number of database locations because the entities will be transferred between different locations.
So, as to establish a solution for the optimization of the query problem, three important parts should be studied: search space, search strategy, and cost model. Search space refers to the generation of sets of alternative and equivalent QEPs that differ in the execution order of the operators. Search strategy refers to the algorithms applied to explore the search space and determine the best QEP based on join selectivity and join sites so as to reduce the cost of query optimization. Cost model refers to the model used to predict the cost for every QEP. In this paper, a quantum-inspired ant colony algorithm will be used as a search strategy, depending on a dynamic cost technique [15], to identify the best QEP. Here, the ant colony algorithm will be used to identify the routing path and the quantum computing will be used to enriche the search process for identifying the join entity order.

Related Work
The key component in the optimizer of query is the employed search methodology. There is extensive and rich literature describing the process of optimization and studying the utilized search technique, indicating its significance. In a distributed database system (DDBS), failures in the midst of a transaction processing (such as failure of a site where a sub-transaction is being processed) directly affect the query optimizer and may lead to an inconsistent database query result [21]. As such, a recovery subsystem is an essential component of a DDBS. To ensure correctness, recovery mechanisms must be in place to ensure transaction atomicity and durability even in the midst of failures.
There are mainly three main search approaches utilized to determine the best QEP, exhaustive, heuristic-based, and randomized strategies. Dynamic programming (DP) is one of the most recognized exhaustive search strategies, and it is used as a search technique in most commercial databases. The basic algorithm of DP used for optimizing query is introduced in [9]. The optimizer process depends on the drill down approach by frequently creating composite search plans using divided smaller search sub-planes till the overall plan is complete. Here, by pruning process, the high-cost search plan is neglected early in the case an alternative equivalent search plan exists with a low cost. Although this technique gives a better performance than a randomized strategy when it comes to queries with a small number of entities, randomized strategies are a much better fit for queries with a large number of entities.
Iterative improvement (II) is one of the most known techniques categorized as randomized algorithms [22]. II initially chose an irregular starting point. Then, the solution improved by repeated acceptance of random subsidence moves until reaching the local minimum. This procedure is repeated until a predetermined halting condition is met. By then, the algorithm reaches the point that has a minimal cost. One of the primary disadvantages of II is that, sometimes, the conclusive outcome is unsuitable, in any event, when an enormous number of beginning stages are utilized. At the point when the set of solutions include an enormous number of significant cost nearby minima, II gets handily caught in one of them.
Genetic algorithm (GA), another randomized algorithm, is introduced in [23]. Here, GA is presented as a solution technique to optimize query issues, and it is tested in comparison with other methodologies. However, this methodology does not think about the modified crossover and mutation process. In [24], the author consolidated GA and the min-max ant system to create an optimization methodology for a query in order to enhance the efficiency of the query. The superiority of parallelism exceptionally appeared by corresponding GA and the max-min ant system in the event of an enormous number of relations. In comparison with different algorithms, this execution plan has less inquiry time, and furthermore, time of query execution is diminished in the optimal plan created. This has shortcomings where the computation time and the cost of computation are expanded due to parallel processing of two algorithms.
As one of the stochastic-based algorithms, the ACO algorithm was used in this investigation as a search methodology for optimizing queries in both centralized and distributed database environments [25]. In [26], the author proposed a multi-colony ant algorithm to improve join inquiries in distributed systems in which tables can be copied, however, they cannot be partitioned or fragmented. In this planned algorithm, 4 sorts of ants coordinate to create an execution plan. In this way, every one of the emphasis has four subterranean insect settlements. So, as to locate the optimal plan, every ant performs dynamic decisionmaking. Two kinds of cost models fixated on total time as well as response time are used for the assessment of the quality of the generated plan. In this algorithm, despite the fact that the total time is diminished and the convergence speed is increased, it is getting a worse performance for a small query and can fall in a local optimum.
Quantum-inspired evolutionary algorithms are one of the key areas of research linking quantum computing with evolutionary algorithms. The theoretical applications of quantum-inspired evolutionary algorithms in various fields are presented for the first time in [27]. In [28], the author applies QIEA for locating minimum costs of the assignment in the quadratic assignment problem (mathematical model for the assignment of a collection of economic activities to a collection of locations). The main contribution behind this paper is to present how the algorithm is tailored to the problem, containing crossover and mutation operators, furthermore setting the overall framework for the utilization of quantum ideas in varied applications. In addition, QIEA is applied along with genetic programming to improve prediction accuracy of toxicity degree of chemical compounds [29]. In this work, the accuracy of the linear equation that was used to calculate the degree of toxicity increased as a result of using genetic programming. Moreover, quantum computing helps in improving the selection of the best of run individuals and handling stinginess pressure to decrease the complexity of solutions. Additionally, in [30], the author creates a new technique to find optimal threshold values at different levels of thresholding for color images and uses a minimum cross entropy-based thresholding method as an objective function. In this technique, the results are described in terms of the best threshold value, fitness measure, and the computational time for each technique at various levels. Here, the convergence curves prove that the use of quantum-inspired concepts along with the ACO technique outperforms the results obtained by the classical ACO technique.
Our proposed algorithm represents an extension to the work submitted in [15]. This work employs the total query time calculated for the distributed query optimization model that is utilized for entities which non-replicated as the model's cost. The processing and the cost of communication for the query plan are calculated dynamically and depend on the path used by ants and the entities site's location. No fixed cost exists over the edge of the problem graph, but it is calculated dynamically as an intermediate outcome while applying the query's joins. Quantum gate will be used in the suggested model as a replacement of ACO stochastic search to enhance the total cost and accelerate the search convergence.

The Proposed Technique
Our developed methodology for optimizing query in the environment of a distributed database will be presented based on the implementation of search space, the method used to obtain the cost, and the search methodology utilized to find the best QEP. The search space implementation and cost calculation method used the same concept as [15] but the search methodology will employ the concept of quantum computing to get the best QEP. Figure 2 shows the main components of the suggested optimization model and the way these components are linked together.

Cost Model
As it exists in [15], the cost calculation method is obtained based on total time (the total sum of all components cost) dependent on the same calculation method as in [26]. Here, the total cost is obtained as a sum of I/O cost, calculated for all join process, and data transfer (communication) cost that is calculated when transferring entities between sites is necessary.: In this figure, the SQL statement was analyzed to identify the involved entities (tables). Then, the database statistics are associated with the identified table extracted from the database category tables. These statistics include the field's length and type, the entity tuple length and number of tuples in every entity, and number of pages required to store the entity data. In addition, the entity site locations and relations between entities are also obtained from the database category tables. The next step is using the information obtained from the database category tables to create the search space. Finally, our search method will be applied to the search space to obtain the best join order. The major component of the model is described in the following sections.

Cost Model
As it exists in [15], the cost calculation method is obtained based on total time (the total sum of all components cost) dependent on the same calculation method as in [26]. Here, the total cost is obtained as a sum of I/O cost, calculated for all join process, and data transfer (communication) cost that is calculated when transferring entities between sites is necessary: where IO join is the cost of the join process and COM Ri represents the cost of transferring entity R i among the location of sites. The IO join cost is computed as: where IO(S k ) is the I/O time for the disk in location S k , P write is the page count that is required to save the join outcome, and P join is the page count accessed to perform the join process between R i and R j . P join is computed as: where P Ri is the page count in entities R i and R j . P write is computed as: where card(R i ) is the tuple count in R i , len(R i ) is the average length for tuple in R i , and ps is the size of page. The cost of required to transfer relation R i from location S k to location S p is computed by: where COM(S k , S p ) is the time needed to move one byte from location S k to location S p .

Build Search Space
The catalog in the database is used to primarily store the schema, which contains information regarding the tables, indices, and views [1]. The information about tables includes the table name, the field names, the field length, the field types, and the integrity constraints between tables. Various statistics are also stored in the catalog such as the number of distinct keys in a given attribute and the cardinality of a particular relation. In addition, the catalog includes information about the resident site for tables, the number of sites (locations) in the system, along with their identifiers and the replication status. This information extracted from the catalog tables will be employed to build the search space. In our algorithm, the search space will be implemented as a graph G = (N, E) where N represents the collection of vertices (nodes) and E represents the collection of edges (arcs) [4]. Every vertex in the graph denotes an entity (table) in the query specification. Two graph vertices are linked together by an edge if the corresponding tables are joined together in the query. Every vertex in the graph is represented as a class data structure and has a set of attributes like number of tuples, tuple length, keys, and site location. Figure 3 show a graph representation for a search space that contains a set of entities and the relations among them.
addition, the catalog includes information about the resident site for tables, the number of sites (locations) in the system, along with their identifiers and the replication status. This information extracted from the catalog tables will be employed to build the search space. In our algorithm, the search space will be implemented as a graph G = (N, E) where N represents the collection of vertices (nodes) and E represents the collection of edges (arcs) [4]. Every vertex in the graph denotes an entity (table) in the query specification. Two graph vertices are linked together by an edge if the corresponding tables are joined together in the query. Every vertex in the graph is represented as a class data structure and has a set of attributes like number of tuples, tuple length, keys, and site location. Figure 3 show a graph representation for a search space that contains a set of entities and the relations among them.

Search Strategy
The search methodology for our proposed model depends on QIACO metaheuristic. The cost of every journey for each ant to locate the minimal spanning graph, which represents the query search space, will be dynamically calculated while building that graph as in [15]. The dynamic calculation of query cost depends on an additional virtual vertex that will be added to the graph to carry the intermediate join process outcome. The cost, in this case, is calculated between the virtual node and the next chosen entity in the join order. In the proposed algorithm, each entity is reformed as one qubit representation as in the form of Equation (5) and quantum partial negation gate will be used to identify the next entity in the join order. The flow of QIACO is described in the following steps: Step 1. Initialization. In this phase, all the parameters used in the model are initialized, depending on work in [31], experimentally. Minor changes were done to the constant values α, β, and ρ. The value of α used as 3 instead of 1, the value of β used as 2 instead of 5, and the value of ρ used as 0.02 instead of 0.1. These changes in the values increased the dependence of our work on the cost, instead of pheromone, and give a better result, while identifying the next entity in the join order. The number of ants will be determined, and pheromone trails will be initialized. All pheromone values will be initialized by an arbitrary small value equal to 1

√
No. O f Entities . The query graph that links the tables (entities) is generated such that every table is connected to all other tables. In this phase, each entity will be associated with a qubit and all entities' qubit probability amplitudes will be initialed as a i = b i = 1 √ 2 which satisfy Equations (5) and (6): All the parameters of the algorithm are initialized in this phase according to Table 1.

Max-Iteration
Varies with the number of entities -

Ant-Numbers
Varies with the number of entities - Step 2. For each ant, select random entity and uses it as the start-up point for the journey of the ant. This entity transfers to the virtual vertex and waits to choose the following entity to implement the join process.
Step 3. Use a partial negation quantum gate to choose the next entity in join sequence. The selection will be done by applying the negation gate, as operator, on all entities' qubit that have a connection path with the current entity. This process will be applied many times according to the amount of pheromone that is raised on the path between the current entity and the connected entities. Then, the entity with best probability will be selected as the next entity. Let X be the Pauli-X gate, which is the quantum equivalent to the NOT gate and represented as: The c th partial negation for operator V is the c th root of the X gate and can be calculated using diagonalization as follows: where t = C √ −1, and: This gate is represented in Figure 4 [32].

Symmetry 2021, 13, x FOR PEER REVIEW
The c th partial negation for operator V is the c th root of the X gate and ca using diagonalization as follows: where = √−1 , and: This gate is represented in Figure 4 [32]. Applying the operator V on a qubit d times is equivalent to: When d = c = 2, this will give = −1 and so In our model, the V gate is used as an operator and is conditionally app on every entity' qubit. The number of times, c, the V gate will be applied o is based on pheromone value and join cost. Here, c will be defined using E where  is the pheromone trail on the arc that connects entity i with enti the heuristic desirability and computed as the inverse of the intermediat tween entities i and j. Here, the join total cost is identified by Equations ( amplitudes for each entity will be updated at time t +1 as: Applying the operator V on a qubit d times is equivalent to: such that if d = c, then V d = X. When d = c = 2, this will give t 2 = −1 and so: In our model, the V gate is used as an operator and is conditionally applied for c times on every entity' qubit. The number of times, c, the V gate will be applied on entity' qubit is based on pheromone value and join cost. Here, c will be defined using Equation (3) as: where τ ij is the pheromone trail on the arc that connects entity i with entity j and, η ij is the heuristic desirability and computed as the inverse of the intermediate join cost between entities i and j. Here, the join total cost is identified by Equations (4) and (7). The amplitudes for each entity will be updated at time t +1 as: After that, the suggested system uses the tensor product ⊗, a way of putting vector spaces together to obtain a larger vector space, so that for n number of entities: The vector obtained in Equation (19) will normalize to the number of elements similar to the number of entities in the model. In this case, each element in the normalized vector will represent the probability of the corresponding entity to be the next entity in the query join order.
Step 4. Instead of selecting the entity with better probability in the normalized vector as the next entity in the join order, the roulette wheel method is used to give the entities with small probability a chance to share in building the query join order.
Step 5. After choose the next entity, the join cost is determined and the outcome of the join process is transferred to the virtual vertex and afterward utilized as the beginning up entity for the following ant's cycle. Equations (7)-(11) are used to determine the join cost.
Step 6. Repeat steps from 3 to 5 till all entities are handled and then the journey cost for the ant is calculated.
Step 7. Select the best journey cost from all the journeys for ants.
Step 8. Pheromone update. At the point when all the ants build their solutions and, in each cycle, the update of pheromone is performed. Every pheromone amount is reduced, to simulate the pheromone evaporation, and raised, to simulate the ant's pheromones deposit on the trail. The modification of pheromone on all graphs' arcs is done using Equation (1). The modifications of the pheromone are also dependent on L k in Equation (2) that symbolize the total cost of the better tour created by ant k.
Step 9. Repeat steps from 2 to 8 until maximum iterations and when the cycles complete, the best trail for all ants is chosen.
The pseudo-code for the suggested model is represented in Algorithm 1, the partial negation gate is represented in Algorithm 2 and the selection of next entity represented in Algorithm 3. Convergence of the Model. In [33], the author proves that the convergence of ACO depends on the pheromone value τ ij from Equation (1) and heuristic information η ij , from Equation (4). The suggested model still depends on Equation (1) for updating the pheromone. Additionally, Equation (4) represents the inverse of cost which is used in the suggested model to find the best route obtained by ants. So, the convergence of the suggested model is guaranteed.
Complexity analysis. As in [34], the computational complexity of the most classical ant colony algorithms depends on the number of nodes n, the number of ants m, and the number of iterations T (the colony lifetime). Considering Equation (3) in more detail, we can notice that the computational complexity of the algorithm also depends on the parameters α and β. Here, the computational complexity was explained as: O T m n 2 (log 2 α + log 2 β) .
In our proposed QACO algorithm, a single quantum superposed state is prepared to encode each node n in the search space, thus exploration of all nods in a single iteration is O(n) time. The selection of the next entity in the join order depends on the negation gate which is represented mathematically by vectors product with O(n) time. Here, the vector product will be applied to all n nods, so the selection process running time is O(n x n) = O(n 2 ). So, the overall computational time for the quantum part in our model is O(n) + O(n 2 ) which can reduced to O(n 2 ). Hence, the computational complexity for our QACO algorithm is: QACO has a computational complexity greater than the classical ant colony, but gives better results as will be explained in the experimental results section.

•
This algorithm uses the total query time calculated for distributed query optimization as the model's cost.

•
The algorithm applies for non-replicated entities.

Experimental Results
In this section, many achieved experiments will be demonstrated to study the efficiency for our proposed model. Furthermore, to evaluate the performance, the accuracy of our proposed model will be compared with the traditional optimization techniques. The classical distributed database query optimization in [15] is built by a modified version from the C# code that implements an ant colony optimization (ACO) algorithm to solve the traveling salesman problem (TSP) which was created by Microsoft MSDN Magazine [35]. The quantum version is implemented using C# along with MATLAB.
To test the proposed model, two types of datasets are used, synthetic and benchmark. The first group of tests, experiments 1, 2, and 3, performed on synthetic dataset, which is randomly generated by a problem generator to simulate the existence of the join for different number of entities. The problem generator has two parts; the database generator and the query generator. The first part generates a synthetic database depending on the number of relations. During this part, the cardinality, the length of tuples, and the join attribute of the relations are defined. In addition, the site number is where the entities resident is identified. The relation cardinalities, the tuple sizes, and the number of sites is randomly generated in range [10,100], [10,50], and [2,10], respectively. The second part generates a chain query depending on the schema generated in the first part and uses the number of required joins as an input. For example, the generator can be used to generate the query with four joins (e.g., QJ1, QJ2, QJ3, and QJ4). The queries generated by the generator, in this case, are random and not dependent on a specific application or database. The second group of tests, experiments 4 and 5, are performed on the TPC-H benchmark dataset, as used in [36]. The TPC-H benchmark is a relational data model database that is distributed vertically, as used in our experiments, in a different site location. This database is a decision support benchmark that consists of a suite of business-oriented ad-hoc queries and concurrent data modifications. This benchmark is used to examine large volumes of data and execute queries with a high degree of complexity. Tables 2 and 3 show the tables' size as in [36] and data distributions on different sites. Table 2. TPC-H tables [36].  All tests and experiments are conducted on a PC with Intel Core I5 2.4.0GHz processor and DDR 8GB main memory running Microsoft Windows 7 Enterprise 64 bit. All experiments are running with fixed parameters α = 3, β = 2, ρ = 0.02 as explained in Table 1. The page size, the network transfer rate, and I/O access rate are set to be 1024, 0.98 × 10 −3 , and 0.98 × 10 −4 , respectively.
In the first experiment, and as in Figures 5-8, the minimum costs obtained by the quantum-inspired model have been compared with the minimum costs obtained by classi-cal ACO, as in [15], with fixed number of sites equal to 5 and different ants (from one ant to 5 ants) and run over different count of entities 5, 10, 15, and 20.

2
orders, lineitem 3 nation, region, customer 4 Part In the first experiment, and as in Figures 5-8, the minimum costs o quantum-inspired model have been compared with the minimum costs ob sical ACO, as in [15], with fixed number of sites equal to 5 and different ant to 5 ants) and run over different count of entities 5, 10, 15, and 20.      In Figure 5, a small search space resulted from the small number of QIACO with one ant which was enough to cover all the search space. So solution (cost) was obtained by single ant only, and adding extra ants to not lead to better solution. In case of 10, 15, and 20 entities, the search space and adding more ants positivity affected the obtained result. The worst, av cost for different number of ants used in this experiment are briefed in Ta formance was calculated as the improvement percentage between the ACO and the average cost in QIACO. From Table 4, we can conclude t algorithm produces better cost than ACO with different number of ants a number of join entities. When the number of entities is increased, the sea ponentially increased and the classic ACO with few iterations cannot co space. So, in case of 15 and 20 entities, the classical ACO with 300 iterations with QIACO with only 100 iterations. In addition, in this case, the QIAC better cost than that of ACO. In case of a query with 5 entities, the search s few numbers of alternative solutions and the improvement percentage In Figure 5, a small search space resulted from the small number of entities (5) and QIACO with one ant which was enough to cover all the search space. So, the optimum solution (cost) was obtained by single ant only, and adding extra ants to the model will not lead to better solution. In case of 10, 15, and 20 entities, the search space becomes larger and adding more ants positivity affected the obtained result. The worst, average, and best cost for different number of ants used in this experiment are briefed in Table 4. The performance was calculated as the improvement percentage between the average cost in ACO and the average cost in QIACO. From Table 4, we can conclude that the QIACO algorithm produces better cost than ACO with different number of ants and for different number of join entities. When the number of entities is increased, the search space is exponentially increased and the classic ACO with few iterations cannot cover this search space. So, in case of 15 and 20 entities, the classical ACO with 300 iterations was compared with QIACO with only 100 iterations. In addition, in this case, the QIACO reaches to a better cost than that of ACO. In case of a query with 5 entities, the search space contained few numbers of alternative solutions and the improvement percentage between classic ACO and quantum ACO not more than 13%. But when the number of entities increased, the corresponding search space increased exponentially which contained a huge number of alternative solutions, and in this case, the effect of the quantum search appear. Here, the improvement percentage for QIACO over classic ACO ranged from 77% to 99%. In the second experiment, to test the efficiency of the proposed algorithm in case of one central database and a different number of join entities, the number of sites is set as one and the number of join entities will start by 3 up to 15. There exists many databases applications that have complicated queries with a massive number of joins that may be reached to 100 or more. Moreover, in some business applications, like banking and retail systems, the application contains queries with smaller number of joins than less than 10. The maximum number of joins used in [26] and [12] was 10 and 15, respectively. We used the maximum of 15 joins in our experiment that are appropriate. Over 10 runs, a comparison is conducted between the average cost for classical ACO and QIACO with fixed number of ants equal to 2 and fixed number of iterations equal to 100 as in Figure 9. In the second experiment, to test the efficiency of the proposed algorithm in case of one central database and a different number of join entities, the number of sites is set as one and the number of join entities will start by 3 up to 15. There exists many databases applications that have complicated queries with a massive number of joins that may be reached to 100 or more. Moreover, in some business applications, like banking and retail systems, the application contains queries with smaller number of joins than less than 10. The maximum number of joins used in [26] and [12] was 10 and 15, respectively. We used the maximum of 15 joins in our experiment that are appropriate. Over 10 runs, a comparison is conducted between the average cost for classical ACO and QIACO with fixed number of ants equal to 2 and fixed number of iterations equal to 100 as in Figure 9. Here, with fewer numbers of joins (less than 5), the same cost (best cost) was obtained in ACO and QIACO. When extra entities are added to the joined query, the search space complexity is exponentially increased. Stating from 9 entities, the QIACO gives average cost better than the average cost given by classical ACO. The classical ACO cannot cover all the search space and may fail in the local minimum, but,the diversity in QIACO covers a much larger space and leads to better cost.
In the third experiment, the suggested model was tested for a fixed number of entities (10 entities) and a different number of sites (from 1 to 5 sites). Over 10 runs, two ants were used with fixed number of iterations equal to 100. Figures 10 and 11 display the minimum and average cost for QIACO and classical ACO. Here, with fewer numbers of joins (less than 5), the same cost (best cost) was obtained in ACO and QIACO. When extra entities are added to the joined query, the search space complexity is exponentially increased. Stating from 9 entities, the QIACO gives average cost better than the average cost given by classical ACO. The classical ACO cannot cover all the search space and may fail in the local minimum, but, the diversity in QIACO covers a much larger space and leads to better cost.
In the third experiment, the suggested model was tested for a fixed number of entities (10 entities) and a different number of sites (from 1 to 5 sites). Over 10 runs, two ants were used with fixed number of iterations equal to 100. Figures 10 and 11 display the minimum and average cost for QIACO and classical ACO.
As shown the figures, with the increase in the number of sits, the minimum and average cost are also increasing, but the QIACO still gives a better cost than classical ACO. The results reveal that QIACO reaches a better performance regarding both the minimum cost and the average cost than classical ACO regardless of number of sites used.    As shown the figures, with the increase in the number of sits, the min erage cost are also increasing, but the QIACO still gives a better cost than The results reveal that QIACO reaches a better performance regarding both cost and the average cost than classical ACO regardless of number of sites In the fourth experiment, four queries, shown in Figure 12, from the mark are used to test the performance of QIACO versus the classical ACO are chosen because they fetch data from many tables.  In the fourth experiment, four queries, shown in Figure 12, from the TPC-H benchmark are used to test the performance of QIACO versus the classical ACO. These queries are chosen because they fetch data from many tables.  As shown the figures, with the increase in the number of sits, the minimum and average cost are also increasing, but the QIACO still gives a better cost than classical ACO. The results reveal that QIACO reaches a better performance regarding both the minimum cost and the average cost than classical ACO regardless of number of sites used.
In the fourth experiment, four queries, shown in Figure 12, from the TPC-H benchmark are used to test the performance of QIACO versus the classical ACO. These queries are chosen because they fetch data from many tables.  Here, a different number of ants, from one to five ants, were employed to get the cost for each query using both methodology classical ACO and QIACO, then the average cost for each query was calculated. As shown in Figure 13, although the average cost in all queries tends to favor the method of quantum ACO, it appears clearly in the case of query No. 8. In query No. 8, a larger search space was produced as a result of an existing number of tables greater than that exist in the other queries. In this case, QIACO can cover larger search space that leads to better QEP. The effect of using more ants to seek for better QEP in query No. 8 is shown in Figure 14. Although the QIACO gives better cost than ACO in all ant numbers, the increase in the number of ants used in both methods leads to a better result. When the number of ants is five, a larger space can cover both methodologies and so ACO and QIACO can reach the same QEP that gives the same optimum cost.
pense of the complexity and therefore the time. In QIACO, the complexity of computing the tensor product that is used while merging the qubits negatively affects execution time and this complexity increases with the number of ants added to the model. As shown in Figure 15, when the number of ants is one, the effect of the tensor product in QIACO does not exist and in this case, the time required to get the best QEP in QIACO is less than the classical ACO. When the number of ants increases, the tensor product clearly affects the search time and in this case, QIACO needs more time to reach to the best QEP than the classical ACO.   best QEP for query No. 8 in ACO and QIACO was compared as shown in Figure 15. Although adding more ants leads to better cost results (as seen in Figure 14), it is at the expense of the complexity and therefore the time. In QIACO, the complexity of computing the tensor product that is used while merging the qubits negatively affects execution time and this complexity increases with the number of ants added to the model. As shown in Figure 15, when the number of ants is one, the effect of the tensor product in QIACO does not exist and in this case, the time required to get the best QEP in QIACO is less than the classical ACO. When the number of ants increases, the tensor product clearly affects the search time and in this case, QIACO needs more time to reach to the best QEP than the classical ACO.  In the next experiment, the average time used by different number of ants to get the best QEP for query No. 8 in ACO and QIACO was compared as shown in Figure 15. Although adding more ants leads to better cost results (as seen in Figure 14), it is at the expense of the complexity and therefore the time. In QIACO, the complexity of computing the tensor product that is used while merging the qubits negatively affects execution time and this complexity increases with the number of ants added to the model. As shown in Figure 15, when the number of ants is one, the effect of the tensor product in QIACO does not exist and in this case, the time required to get the best QEP in QIACO is less than the classical ACO. When the number of ants increases, the tensor product clearly affects the search time and in this case, QIACO needs more time to reach to the best QEP than the classical ACO.

Conclusions
In this work, we proposed a novel ACO methodology based on the quantum-inspired paradigm to optimize a query by finding the best execution path, in terms of cost, for running query on distributed database. The QIACO model, each entity is represented as a qubit and the quantum partial negation gate is employed in order to update the entity' qubit amplitudes. The gate conditionally is applied on the entity's qubit based on phero-

Conclusions
In this work, we proposed a novel ACO methodology based on the quantum-inspired paradigm to optimize a query by finding the best execution path, in terms of cost, for running query on distributed database. The QIACO model, each entity is represented as a qubit and the quantum partial negation gate is employed in order to update the entity' qubit amplitudes. The gate conditionally is applied on the entity's qubit based on pheromone value and join cost. In our model, the average cost that is obtained by QIACO gives a better performance than the cost obtained by the classical ACO. This is because in QIACO, a much wider solution space can be analyzed due to the structure of the model which is not prescribed in advance but is left to the system, arising from qubit superposition via quantum gates. By merging the ACO and the quantum superposition concepts, we successfully enhanced the cost of the running query over the distributed database. The results show that the calculation cost obtained by QIACO is better than that of the classical ACO. The improvement in performance ranged, in some cases, between 77% and 99%, but at the expense of time. The tensor product used in QIACO affects the search time and needs more time to reach the best QEP than the classical ACO. Additionally, the results imply that, although the classical ACO is used successfully with simple joins that have a small number of join entities, QIACO can also be used with simple and complex queries with numerous join entities. In future work, the plan is to use the hyper-graph, instead of the classical graph, to represent the search space. Using the properties and algorithms of sets in the hyper-graph when representing the search space may affect the search methodology and so enhance the query cost. In addition, the complexity of our proposed algorithm and the results obtained through it will be compared with others quantum versions of swarm intelligence algorithms such as particle swarm optimization, artificial bee colony optimization, and firefly algorithm.