A Transformation-Based Quantum Physical Synthesis Approach for Nearest-Neighbor Architectures

: The physical synthesis concept for quantum circuits, the interaction between synthesis and physical design processes, was ﬁrst introduced in our previous work. This concept inspires us to propose some techniques that can minimize the number of extra inserted SWAP operations required to run a circuit on a nearest-neighbor architecture. Minimizing the number of SWAP operations potentially decreases the latency and error probability of a quantum circuit. Focusing on this concept, we present a physical synthesis technique based on transformation rules to decrease the number of SWAP operations in nearest-neighbor architectures. After the qubits of a circuit are mapped onto the physical qubits provided by the target architecture, our procedure is fed by this mapping information. Our method uses the obtained placement and scheduling information to apply some transformation rules to the original netlist to decrease the number of extra SWAP gates required for running the circuit on the architecture. We follow two policies in applying a transformation rule, greedy and simulated-annealing-based policies. Simulation results show that the proposed technique decreases the average number of extra SWAP operations by about 20.6% and 24.1% based on greedy and simulated-annealing-based policies, respectively, compared with the best in the literature.


Introduction
Quantum computing is part of computer science research that focuses on the development of computers based on quantum theory that examines the nature and behavior of matter [1]. The invention of quantum computers represents a considerable leap in the ability of computer processing [2]. A quantum computer gains extreme processing power by following the laws of quantum physics. This improvement is achieved through the ability to have several states and execute different commands using all possible permutations at a single time. There is a fundamental difference between classical computers and next-generation quantum computers. A classical computer performs preset commands based on classic physics rules, but a quantum computer is a device that uniquely identifies a physical phenomenon based on quantum mechanics to basically detect a new mode of information processing. In a typical computer, the information is fed into a series of code bits, and these bits are manipulated by the Boolean logic gates that are applied serially to obtain the final result. In a quantum computer, instead of using transistors and conventional computer circuits, atoms and other fine particles are used to process information. An atom can act as a quantum bit of memory on a computer, and transfer of information from one location to another can also be achieved by optical fiber [3,4].
Physical design is one of the main processes of the quantum circuit design. The other process is synthesis. The synthesis and physical design processes were traditionally performed separately because the integration of two processes into one monolithic process makes the complexity of design process unmanageable [5]. However, without interaction between the physical design and the synthesis processes, the generated layout was not Quantum Rep. 2021, 3 436 good. Addressing this issue, the physical synthesis concept [5], the interaction between synthesis and physical design processes, was introduced in [6] for quantum circuits. The physical synthesis modifies the netlist or layout considering the layout information to improve the objectives (e.g., latency) or meet the design constraints.
In most physical platforms, performing quantum gates on non-adjacent qubits is error prone and sometimes is hindered by the target technology [7]. Hence, quantum gates are restricted to be performed on neighboring qubits. To perform a gate, a communication channel of SWAP gates is needed to be constructed if the qubits in the physical environment are not adjacent. These additional SWAP gates increase the latency and error probability of the original quantum circuit [8]. Therefore, the fewer SWAP gates are added, the faster a quantum circuit is executed. This problem is an NP-hard problem [9]. Several heuristic approaches have been already proposed that tried to map qubits of a circuit on nearestneighbor architectures in a way that the number of SWAP gates were minimized [10][11][12][13]. Focusing on this problem, we propose a transformation-based physical synthesis technique to decrease the number of SWAP gates. This procedure takes the mapping information and uses transformation rules to substitute some parts of the netlist so that the circuit needs fewer SWAP gates to be inserted.
The rest of the paper is organized as follows: The Section 2 overviews the prior works. Section 3 contains the main idea of the transformation method as well as our explanation of our strategy for applying this method. Sections 4 and 5 show the experimental results and conclusion, respectively.

Related Work
This section is divided into two parts. In the first part, the works that use transformation rules for logic synthesis or post-synthesis optimization are mentioned. In the second part, the research done on the physical synthesis in quantum circuits is stated.
The work performed in [14] presented the idea of local transformation of reversible circuits. While the main purpose of this work was not post-synthesis optimization, its idea was extended by other researchers to improve circuit costs. The authors defined a canonical form for circuits in the NCT library and introduced a complete set of rules to transform any NCT-constructible circuit into its canonical form, which may or may not be compact. Shende et al. [15] proposed a new rule for simplification of reversible circuits in the NCT library. The concept of applying a rule set was extended in [16], where the authors introduced several transformation rules based on a set of predefined patterns called templates. In [17], template matching with up to six gates was used in post-synthesis optimization. Similarly, the Toffoli-Fredkin templates were explored in [18,19]. Toffoli templates were expanded in [20,21] by the addition of all templates of size 7 (five templates) and a set of templates of size 9 (four templates). Maslov et al. [22] used templates and rules to simplify quantum circuits, such as a 10-gate quantum network for a 3-qubit full adder. Lu et al. [23] proposed equivalent circuits. Equations were introduced in [23] to simplify the quantum circuit as much as possible. Saeedi et al. [24] extended the templates to work with up to three SWAP gates. Arabzadeh et al. [25] proposed a set of simplification rules in terms of positively and negatively controlled Toffoli gates. An optimization in [26] used a window to select potential subcircuits first. Abdessaied et al. [27] used Boolean satisfiability for template matching. In the work performed in [28], a systematic method of generating all templates with a given number of lines was presented. Bandyopadhyay et al. [29] proposed a post-synthesis optimization technique for reversible circuits based on newly defined templates. In their work, templates are applied on a specific order over the input circuit and exhaustively search through the circuit for possible replacements.
The physical synthesis concept in quantum circuits was introduced in [6] for the first time and some physical synthesis techniques proposed for ion trap technology in the next papers [30][31][32][33].
Most of the above-mentioned papers are at the circuit level. However, our focus is on the physical design level after the physical mapping is done in nearest-neighbor architectures.

Transformation-Based Physical Synthesis
The physical synthesis modifies the netlist or layout considering the layout information to improve the objectives (e.g., latency or error probability) or meet the design constraints. The proposed scheme in this paper uses transformation rules to achieve the desired improvements. These templates are composed of gates implementable in nearestneighbor technologies [34]. To apply those to the circuit, we present a flow. In the rest of this section, we first introduce transformation rules and then present the flow.
Transformation rules consist of two equivalent sequences of gates. The first sequence of gates is matched with a section of the circuit for simplification, and it is replaced with its equivalent one when a match is found. Figure 1 shows our transformation rules used in our approach. These rules are easily verifiable by checking their equivalent matrices.
constraints. The proposed scheme in this paper uses transformation rules to achieve the desired improvements. These templates are composed of gates implementable in nearestneighbor technologies [34]. To apply those to the circuit, we present a flow. In the rest of this section, we first introduce transformation rules and then present the flow.
Transformation rules consist of two equivalent sequences of gates. The first sequence of gates is matched with a section of the circuit for simplification, and it is replaced with its equivalent one when a match is found. Figure 1 shows our transformation rules used in our approach. These rules are easily verifiable by checking their equivalent matrices. Figure 2 shows the proposed flow for applying the transformation approach. The flow uses an optimized gate-level netlist as an input and generates a scheduled mapping. We use the mapping approach proposed in [11] to map the input netlist onto the architecture.
After building the mapped circuit, our optimization loop is started. Our greedy policy for transformation is as follows. One sequence is searched in the input netlist, and the equivalent sequence is tentatively replaced into the location that it is found. Then, the number of SWAP gates is calculated. If the number of SWAP gates is improved, this sequence replacement is accepted. Otherwise, it is rejected. When the search of the first sequence is completed, the search of the next sequence with the same mechanism follows. The optimization loop continues until all sequences are examined.  Figure 2 shows the proposed flow for applying the transformation approach. The flow uses an optimized gate-level netlist as an input and generates a scheduled mapping. We use the mapping approach proposed in [11] to map the input netlist onto the architecture.

An Example
In this section, an example is given to illustrate our transformation-based physical synthesis approach. Figure 3a,b shows a template and a quantum circuit operating on q0, q1, q2, and q3, respectively. It can be easily verified by matrix multiplication that the two circuits shown in Figure 3a are functionally equal. This circuit has 6 gates and 4 qubits. If the initial locations of the qubits on the lattice are as in Figure 3c, four SWAP gates are needed, as shown in Figure 3d. However, when the two-gate template is replaced by its equivalent one, the circuit is transformed into one shown in Figure 3e that needs two SWAP gates, as shown in Figure 3f. After building the mapped circuit, our optimization loop is started. Our greedy policy for transformation is as follows. One sequence is searched in the input netlist, and the equivalent sequence is tentatively replaced into the location that it is found. Then, the number of SWAP gates is calculated. If the number of SWAP gates is improved, this sequence replacement is accepted. Otherwise, it is rejected. When the search of the first sequence is completed, the search of the next sequence with the same mechanism follows. The optimization loop continues until all sequences are examined.

An Example
In this section, an example is given to illustrate our transformation-based physical synthesis approach. Figure 3a,b shows a template and a quantum circuit operating on q0, q1, q2, and q3, respectively. It can be easily verified by matrix multiplication that the two circuits shown in Figure 3a are functionally equal. This circuit has 6 gates and 4 qubits. If the initial locations of the qubits on the lattice are as in Figure 3c, four SWAP gates are needed, as shown in Figure 3d. However, when the two-gate template is replaced by its equivalent one, the circuit is transformed into one shown in Figure 3e that needs two SWAP gates, as shown in Figure 3f

Experimental Results
To evaluate the proposed approach, it was applied to the benchmark circuits from [34]. In this paper, we targeted the number of SWAP gates as the objective function and reduced it to minimize the latency and error probability of the circuits. Although our approach is applicable to all kinds of nearest-neighbor architectures, we applied it to the 2D square lattice topology to compare our approach with the previous one [11]. Table 1 shows the number of SWAP gates and the run time of the benchmark circuits resulted from prior physical design flow [11] and our physical design flow enhanced by the template-matching physical synthesis technique. The number of SWAP gates of circuits obtained by the prior physical design flow and ours are shown in the columns "Prior Physical Design Flow" and "Our Physical Design Flow," respectively. The column "Improvement" shows that the improvement of the number of SWAP gates resulted from the physical synthesis approach proposed in this paper. As can be seen, a considerable improvement of 20.6% (on average) was achieved in the number of SWAP gates of the benchmarks. The columns "Prior Physical Design Flow" and "Our Physical Design Flow" under "Run Time" show the run time of the prior physical design flow and the run time of our optimization technique, respectively. The last column includes the runtime overhead imposed by our optimization approach.

Experimental Results
To evaluate the proposed approach, it was applied to the benchmark circuits from [34]. In this paper, we targeted the number of SWAP gates as the objective function and reduced it to minimize the latency and error probability of the circuits. Although our approach is applicable to all kinds of nearest-neighbor architectures, we applied it to the 2D square lattice topology to compare our approach with the previous one [11]. Table 1 shows the number of SWAP gates and the run time of the benchmark circuits resulted from prior physical design flow [11] and our physical design flow enhanced by the template-matching physical synthesis technique. The number of SWAP gates of circuits obtained by the prior physical design flow and ours are shown in the columns "Prior Physical Design Flow" and "Our Physical Design Flow", respectively. The column "Improvement" shows that the improvement of the number of SWAP gates resulted from the physical synthesis approach proposed in this paper. As can be seen, a considerable improvement of 20.6% (on average) was achieved in the number of SWAP gates of the benchmarks. The columns "Prior Physical Design Flow" and "Our Physical Design Flow" under "Run Time" show the run time of the prior physical design flow and the run time of our optimization technique, respectively. The last column includes the runtime overhead imposed by our optimization approach.  1 All results of this section are obtained on a Core i3 with 6 gigabyte of memory. 2 As calculated by the Rational Quantify suite.

Heuristic Algorithm Analysis
As stated before, we followed a greedy approach to accept or reject one substitution. In other words, the substitutions increasing the number of SWAP gates were rejected. To examine the impact of applying another heuristic on the result, we used simulated annealing (SA) heuristics [35] in accepting or rejecting substitutions. Table 2 shows the results of using the heuristic. The column "Our Approach Based on SA" under "Number of SWAP Gates" shows the number of SWAP gates obtained by our physical design flow when we substitute the simulated annealing heuristic for our greedy approach. The columns "Our Approach Based on Greedy" and "Our Approach Based on SA" under "Run Time" show the run times of our physical design flow using the simulated annealing approach and the greedy approach, respectively. The column "SA/Greedy Ratio" under "Number of SWAP Gates" contains the ratio of the number of SWAP gates obtained by simulated annealing to that achieved by our greedy approach. The column "Improvement" shows that the improvement of the number of SWAP gates resulted from the physical synthesis approach based on SA. The column "Overhead" includes the runtime overhead imposed by the SA approach. The last column includes the ratio of the run time of the flow based on the simulated annealing approach to that based on our greedy approach. It can be observed from the table that simulated annealing provided slightly better results than the greedy approach in most cases. However, on average, the run time of simulated annealing is almost 6.49 times longer. This observation might suggest that while various heuristics may provide slightly different results, it is the execution time that varies the most among them. In other words, it appears that the execution time is the determining factor in choosing among the heuristic approaches. Based on this, we chose the greedy approach for the remainder of this paper. Figure 4 depicts the behavior of the number of SWAP gates obtained by the two approaches in accepting or rejecting a substitution. Although the improvement obtained by our approach depends on the structure of a circuit, as the number of gates increases, more templates can be potentially found in the circuit. Therefore, as the figure implies, the improvement increases with increasing the number of gates.  Figure 4 depicts the behavior of the number of SWAP gates obtained by the tw proaches in accepting or rejecting a substitution. Although the improvement obtaine our approach depends on the structure of a circuit, as the number of gates increases, templates can be potentially found in the circuit. Therefore, as the figure implies, th provement increases with increasing the number of gates.

Conclusions
The idea behind this paper is to present equivalent models with the same level n ber and to use them in the physical synthesis of quantum circuits. Physical synthes volves making local changes in the netlist to improve design criteria, including the d of quantum circuits. In this paper, a number of templates are proposed, and by subs ing these templates in different benchmark circuits, an improvement is presented i number of SWAP gates. To put the proposed templates into the experiment, nea

Conclusions
The idea behind this paper is to present equivalent models with the same level number and to use them in the physical synthesis of quantum circuits. Physical synthesis involves making local changes in the netlist to improve design criteria, including the delay of quantum circuits. In this paper, a number of templates are proposed, and by substituting these templates in different benchmark circuits, an improvement is presented in the number of SWAP gates. To put the proposed templates into the experiment, nearestneighbor architectures are selected as our substrate architecture. The results show that the template-matching approach improves the number of SWAP gates up to 41%.
To evaluate our greedy approach in deciding a substitution, we compared it with the SA approach. The results showed that the SA method improves the number of SWAP gates only marginally, while its run time is almost 6.49 times longer.