Quantum Arithmetic for Directly Embedded Arrays

We describe a general-purpose framework to design quantum algorithms relying upon an efficient handling of arrays. The corner-stone of the framework is the direct embedding of information into quantum amplitudes, thus avoiding the need to deal with square roots or encode the information in registers. We discuss the entire pipeline, from data loading to information extraction. Particular attention is devoted to the definition of an efficient tool-kit of quantum arithmetic operations on arrays. We comment on strong and weak points of the proposed manipulations, especially in relation to an effective exploitation of quantum parallelism. Eventually, we give explicit examples regarding the manipulation of generic oracles.


Introduction, motivation and main results
Quantum hardware and software are still in their early days of development, thus the design of quantum algorithms typically focuses on low-level operations. Although one should always keep in mind the hardware limitations, especially when describing possible near-term implementations of quantum algorithms, it is convenient to pursue higher levels of abstraction. Apart from its long-term and algorithmic interest, a more abstract and standardized approach serves practical purposes too, for example that of making the benchmarking of quantum computer performances a more solid and transparent process. In turn, this helps pushing the research and the development in quantum computation at all levels.
In the present paper, we describe a novel framework for the design of quantum algorithms on a more abstract plane. To this aim, our first proposal consists in the definition of a quantum matrix, namely a quantum state organized in two registers: where | n J indicates a register composed of n J qubits corresponding to J = 2 n J states, while | n I is a register composed of n I qubits corresponding to I = 2 n I states. The overall state |ψ , as defined in (1), is manifestly presented with the structure of a matrix; specifically, we interpret i as the index running over the rows and j as the index running over the columns. The rightmost qubit within a register is associated with the least significative digit of the associated index, in binary notation. 1 This way of storing the information has a common ground with that of Flexible Representation of Quantum Images (FRQI) and the Novel Enhanced Quantum Representation (NEQR) [Le+11] [Zha+13]. The main difference with FRQI and NEQR is that we codify the information of the (i, j) entry of the matrix in the quantum amplitude c ij .Intuitively, the matrix (1) is a bi-dimensional memory array where c ij encodes the information stored in the |i n I ⊗ |j n J memory location (see Figure 1). The second proposal that we describe in the present work is a key technical feature about how we encode the information into the quantum amplitudes, the so-called direct embedding [Kub+20]. Namely, the information to be stored into the quantum matrix is directly loaded into the amplitudes without taking square roots, as it is instead usually done in the literature. Such loading choice has several and important implications in later stages of the quantum algorithms and -most importantly-the information stored into the quantum state is handled and combined more easily, because algebraic operations are not hampered by the presence of square roots. This allows us to define an "arithmetic library" composed of many fundamental arithmetic operations to handle arrays stored into the quantum matrix in an efficient manner. Such "general purpose" library provides a versatile framework for the implementation of wide classes of algorithms. In this work, we provide some simple example algorithms without aiming to be exhaustive. The possibility of implementing arithmetic operations within a quantum framework has been considered in the literature since the early days of quantum computation. Apart from the quantum implementation of logical circuits corresponding to basic operations, like the quantum adder [Dra00;Cuc+04], also the manipulation of "continuous" numbers has been studied. Let us mention some works which, at least in spirit, are closer to ours [Wan+20; VBE96; LB99; XY19]. The difference with such approaches consists in the fact that we use a new embedding and organize the information into a matrix (58); these two aspects combined allow us to work in a transparent and simple manner. For the same reason, the extraction of the information at the end of the quantum circuit requires strategies adapted to our encoding.
The third and final proposal in this paper is to give a full overview of the complete pipeline, or overall structure, of the generic algorithm admitting implementation within this framework. The first step of every algorithm corresponds to loading some input data. In the quantum case, it is often convenient to split this step into two sub-steps: • loading a bi-dimensional function f ij (possibly by means of methods that load information a line at a time). 2 It is not strictly necessary to split the loading into two steps. Yet, we consider such splitting because -typically-we adopt different loading techniques for them: the probability distribution is loaded with a state preparation algorithm (e.g. a multiplexor binary tree); the function is loaded by means of an auxiliary qubit meant to tell "good" and "bad" states apart. We describe the first step of the pipeline in Section 2.
In Section 3 we describe the second step of the pipeline corresponding to the implementation of various arithmetic operations, typically at the level of entire arrays or sub-arrays, and we refer to it as quantum arithmetic. In Section 4 we describe the last step of the pipeline, which corresponds to extracting the information that we have stored in the quantum state, namely the read-out of the state that encodes the result of the algorithm.
Figure 2: Diagramatic structure of the pipeline. The dark grey nodes represent the three main steps; the nodes in red corresponds to algorithms that are not efficient, for those in yellow the efficiency varies from case to case and finally the nodes in green represent efficient algorithms.
One of the advantages of organizing the pipeline as in Figure 2 is that it enjoys a modular structure, therefore we can develop and analyse each of the steps independently, achieving a better understanding of the problems in each domain. The color coding corresponds to the efficiency of the single modules. 3 In particular, an overall efficient algorithm would correspond to an end-to-end green path from left to right across the diagram. In searching for possible implementations for a desired algorithm, the challenge is to improve the necessary blocks so to follow a completely green path.

Data loading
Data loading is a generic step which is required essentially in any quantum algorithm. The actual data to be loaded can vary in nature and serves different purposes. The data can correspond -for example-to the discretization of a normalized general real function f defined on a two-dimensional domain. This is, for instance, the typical setup needed for many tasks in mathematical finance, where the two dimensions represent an underlying price and time, respectively. Having in mind applications to finance, we focus on the loading of a real function upon a probability distribution. Yet, we can as well think of the loading of more general data corresponding to a complex matrix, as long as the normalization of the quantum state is respected.
The recipe described in the appendix works in a pointwise fashion, exploiting an auxiliary register to store the desired value into the quantum amplitude at each "memory address", namely, to store it into the associated entry of the quantum matrix. It is important to underline from the outset that this pointwise approach is generically not efficient. In order to attain efficiency at the level of the full algorithm, we need to assume that the loading procedure can be implemented in an alternative and efficient way; in other words, we need to assume the existence of a suitable efficient oracle. Nonetheless, as we will show and stress later, a set of efficient manipulations for generic arrays is possible even when their loading is not efficient. This observation stems directly from the modular structure described in the pipeline of Figure 2.
There are two different aspects related to the complexity of the state preparation, the quantum circuit complexity, on one side, and the complexity of the pre-processing algorithm (where needed), on the other. The former expresses a count of quantum operations or some quantitative estimation of the depth of the quantum loading circuit, the latter refers to the possible pre-processing needed to compute the case-specific values for the parameters of the quantum loading circuit. 4 Here, we are going to discuss only the former, namely, the circuit complexity. To this purpose, we adopt the customary complexity indicator which simply relies on a count of the necessary CNOT gates. This is motivated by the fact that CNOTs are sensibly more error-prone and require a longer execution time than single-qubit gates, as commented -for instance-in [SBM06].
Loading a generic real array is not a trivial problem. In Appendix A.1 we referred to a pointwise loading, without worrying about its optimization. 5 To this regard, the state of the art is currently set by two alternative approaches [Bru20]: one based on multiplexors [MV05;SBM06] and the other one based on Schmidt's decomposition [PB11;Pat13]. Both approaches give essentially the same leading CNOT complexity, namely, a number of CNOTs which scales as 2 n+1 for the preparation of the generic n-qubit state.
Let us stress that, in the very specific case where we need to load a constant array, the procedure of Appendix A.2 requires (in the worst case scenario) n I x-gates, two y-rotations and one multi-controlled NOT gate. Such numbers must be compared with their classical counterpart, where the loading of a constant array on a line of the I ×J matrix requires J operations, considering that the process of copying a single number from memory is an operation. Therefore, the loading of a constant function is more interesting from the quantum speed-up perspective than the pointwise loading of a generic function.
Indeed, in principle, we need exponentially fewer operations on a quantum computer to load a constant array. Interestingly, the number of operations needed does not depend on the length of the constant array that we want to load, but it does depend on the number of rows of the matrix that we have to control. Here we can directly see the nature of quantum systems in practice, there is an "extra" cost associated to acting on a single element of the system without impacting the others. That makes operations on single elements inefficient and operations on the whole structure very efficient.

Quantum arithmetic
In the present section we provide a collection of tools for the efficient arithmetic handling of arrays encoded into a quantum matrix through direct embedding. These tools have been implemented and tested using Qiskit [ANI+21]. We present a selected set of implemented operations. Other operations potentially implementable within this framework are -for instance-those described in [CW12].

Ordering
The first operations that we introduce are those which allow us to move elements within the quantum matrix. Manipulating single elements in the matrix has a much higher cost than performing operations on the whole structure. For this reason, we first introduce a global reversing operation and then we introduce generic permutations.

Reversing
By reversing, we mean the operation where, for concreteness, we have addressed the reversing operation on the i-th row of the quantum matrix. Note that it is straightforward to perform the reversing operation on a column. We divide the process in three steps: • Mask the row. In this case we only need to mask the register corresponding to the row (the | n I register) and leave the column register untouched. For more information on this operation see Appendix A.1.
• Apply n J controlled x-gates. The controlled qubits are those of the row register. The target qubits are those of the column register.
• Undo step one, by applying again the same masking operation as before.
Following the steps above, we can perform a reversing operation on any row of the quantum matrix. If we wanted instead to reverse the whole matrix, the operation would be more efficient than just reversing a row or a column. 6 In that case, there is no need to control on any qubit, we just need to apply an x-gate to each register of the quantum matrix.
As an explicit example, let us think of an I × J quantum matrix and, for simplicity, let us consider I = 2. Hence, we need n I = 1 and n J = log 2 (J). Suppose that we have loaded the following quantum matrix: In order to reverse the first row, we start by applying an x-gate to the row register obtaining the state: Now, the row on which we are focusing, namely the one corresponding to c 0j for j = 0, ..., J − 1, has all the qubits in the row register set to one (in this case the row register is just the qubit | n I ). So, by means of controlled operations on |χ 2 , we act only on the row c 0j . Specifically, we apply an x-gate controlled on the row register, which acts on all the qubits of the column register. This yields Finally, we apply again an x-gate to the row register obtaining: The last step consists in undoing the mask.

Permutations
Permutations of two elements of an array, i.e. swaps of two entries, are demanding operations as we have to manipulate individual elements, instead of whole blocks in the quantum matrix. For the sake of simplicity, in what follows we discuss the algorithm referring to a quantum matrix given by a single row. Generalizing to larger matrices is straightforward. It is relevant to point out that also the extension to higher dimensionalities, from bi-dimensional matrices to d-dimensional tensors, is doable, yet it requires additional controlled operations. 7 Specifically, consider the state: and let us write it in the notation: which is more convenient to understand how the different gates act on the order of the components. The strategy presented here to perform a permutation of two arbitrary elements in (8) consists in using a pivot. That is, we choose a fixed position k (pivot) and implement the permutations of the component placed at position k and any other component in the array. Once this is done, the generic swap of two elements can be obtained by means of three operations, at most. For example, if we aim to make a permutation of elements in positions i and j in we would need to perform the following three steps: First, we permute the positions j ⇐⇒ k, obtaining Then, we consider the permutation of positions i ⇐⇒ k corresponding to the permutation of elements i ⇐⇒ j: Finally, we perform again step one, obtaining the desired permuted state, namely Now, the key in the algorithm is to understand how to actually perform in practice the permutations with the pivot. They can be implemented through x-gates and controlled x-gates. Moreover -without losing generality-we choose the last element of the register as the pivot. If we have n J qubits, the single x-gates acting on state (8) have the effects described in Table 1.

Gate
Old State New State From Table 1 we can see that the single x-gates are performing swaps of blocks of contiguous memory positions. When we act on more significant qubits we are affecting bigger blocks and each gate is affecting the whole state. In this algorithm we are only interested on the effect that the gate has on certain blocks of the array (the ones highlighted). Using multi-controlled x-gates where the controls are applied to all qubits (except the one where we apply the x gate) and acting on state (8), we get the results reported in Table 2.

Gate
Old State New State  In this case it is clear that the effect of the controlled operations is to permute the last elements of each highlighted block. We need to combine both operations, x-gates and multi-controlled x-gates, to perform the permutation of any element with the pivot (the last element, according to our choice). The strategy can be implemented recursively in the following way: 1. Move the last element of the array to the block where the element we wish to permute is located. This is done through a suitable multi-controlled x-gate.
2. If at this point the two elements that we wanted to interchange have been actually swapped, then undo all previous operations (both x-gates and multi-controlled x-gates) except for the last one and finish. These operations are needed to bring back to their original position all the other elements except the pair that has been swapped. Otherwise continue.
3. Swap the blocks on which we have acted at step 1. This is done through a single x-gate and serves the purpose of moving to the right the block on which we need to focus.
4. Go back to step 1.
For the sake of clarity, let us give a simple explicit example. Consider the state and suppose we want to permute the first element f 0 with the pivot element f 7 . We can proceed as follows: Now that we have effectively swapped the element 0 and the element 7 we just have to relocate the rest of the elements (Step 3). (

Cyclic permutations
Cyclic permutations corresponds to the two transformations: Left : Right : where we follow the same notation adopted in Section 3.1.2. 8 These operators have been discussed in depth in [Li+14] and their implementation can be immediately extended to our framework, upon adding suitable controls.

Addition
In this subsection we discuss both the sum of whole arrays and the sum of their components (reduction).

Sum
Consider the state |ψ 1 given in (61): where we have omitted the auxiliary register | a for convenience. Applying a Hadamard gate on the first qubit of the row register, namely we get the sum and difference of the the rows grouped in pairs, explicitly In the first row of (18), we get the sum of the first and second row of (16). In the second row of (18), instead, we get the difference between the first and the second row of (16). In the third row of (18), we get the sum of the third and fourth row of (16), and the same structure continues on. An analogous sum/difference operation can be performed in columns. Note that, in order to consider the correct number of 1 √ 2 factors, we need to count the Hadamard gates that we apply. Eventually, to sum two rows that are not in the same pair, we can take advantage of the permutations described in Section 3.1.2.

Reductions
By reduction we mean the summation of all the elements of an array where the result of the reduction is stored into the first entry of the array. Consider again the state defined in (16), that is, In order to perform a reduction by rows (i.e. summing the elements of each row and storing the result on the first column), we just need to apply a Hadamard gate to every qubit of the column register which gives: The parenthesis implies that we have the reduction of each row in the first column (which corresponds to |0 n J ). In the rest of the columns, we get other reductions with different combinations of signs, as implemented by the Walsh-Hadamard operator in (21). If we were to do a reduction by columns, instead of by rows, we will need to apply the Walsh-Hadamard gate to the row register, instead of the column register. Correspondingly, we will get the reduction of the columns on the first row.

Products
In this subsection we consider the product of a whole array by a constant and the product of two arrays. Eventually, the scalar product of two arrays can be obtained composing the product of two arrays and a reduction. Similarly, the squaring of an array can be obtained as the product of the array by itself.

Multiplication by a constant
In order to multiply a row or a column by a constant, we need an extra qubit which we denote |0 mul . Consider the state defined in (61), but this time supplemented with the extra qubit: The multiplication operation consists merely in a controlled rotation. The rotation needs to be applied onto the auxiliary register |0 mul , it introduces a factor α = cos θ 2 , so we are initially restricted to multiplication by numbers between 0 and 1. This limitation can be circumvented by means of suitable manipulations of the normalization constants.
Depending on the controls that we apply, we can multiply a row, a column or a specific individual entry by α. For example, assume that we want to multiply the first row by α. In order to act solely on the first row, we first have to mask it by applying |ψ 2 = (X n I ⊗ 1 n J ) |ψ 1 ; (24) explicitly, we obtain: The next step is to perform the controlled y-rotation where we have indicated the controls of the controlled rotations with the symbol C. Thus, (26) is to be interpreted as a controlled y-rotation acting on |0 mul and controlled by the row register. Explicitly, (26) gives Eventually, we have to unmask the state which yields The relevant information is marked by |0 mul .

Array multiplication
In the present section, we describe the theoretical proposal for a more advanced operation: the multiplication of arrays. Its (overall) efficiency is related to that of the loading process. Let us assume to dispose of the following oracles: which load the arrays f and g respectively. Morever, consider the swap operator: S |α ⊗ |β ⊗ |j = |β ⊗ |α ⊗ |j .
To build the multiplication operator we are going to start from the state: where we have two auxiliary qubits: |0 f to load f and |0 g to load g. First we load f : In the second step we swap qubits | f and | g : The third and last step consists in applying the oracle O g : The multiplication of arrays f and g is encoded in the registers marked by |0 f ⊗ |0 g : 9 This procedure, when extended to the multiplication of more than two arrays. It is worth noticing the fact that this method depends on the loading complexity, that is, the efficiency of the employed oracles. As a final comment, when in Section 1 we split the loading of the integrand f · p in two parts, that associated to the distribution and that associated to the function, we were in fact performing the multiplication of the two arrays {p j } and {f j }.

Squaring and scalar product
The square of an array and the scalar product of two arrays can be obtained from operations that we have already defined above. The former is trivially just the multiplication of an array by itself. On the other hand, if we perform the reduction of the product of two arrays, we get their scalar product. As their construction depends on the steps commented in Section 3.3.2, the efficiency of the square of an array and the scalar product of two arrays is strongly dependent on the loading strategy for the arrays.

Information extraction
In this section we describe a method to extract the information stored in the quantum circuit at the end of the algorithm. The method is called Quantum Coin [AW99; SH20] (QCoin) and it can be applied in general to extract the absolute value of the amplitude a along a state |χ . Although in this article we rely on the QCoin algorithm, there are other techniques which can be considered, such as those appearing in [Bra+00][Suz+20][Gri+20] [Giu+20]. 10 Let us suppose that we want to extract the amplitude of the state for some specific values i and j. For example, we want to read one entry of the quantum matrix defined in (1). The QCoin algorithm starts with an unamplified estimation of the amplitude along |χ , which is then iteratively refined. Such unamplified estimation consists in a repeated set of identical experiments and measurements, which allows us to get an empirical valueμ for the absolute value of the desired amplitude. This estimation provides a confidence interval relying on standard statistical tools, e.g. the Chebyshev's inequality. Then, one can pursue an iterative Grover amplification strategy, implemented as a "zoom-in" operation into the confidence interval estimated previously. This amounts to an iterative exponential refinement of the accuracy of the estimation ofμ [AW99]. We give some additional details below.

Amplified iterations
An amplified stage consists of two operations: • A constant shift of the amplitude a by the position of the lower bound of the confidence interval (39) of its empirically estimated value, namely By construction, the shifted amplitudeā is expected to have a positive value of the order of half the size of the confidence interval. 11 • An enhancement of the probability P of measuring |χ . Specifically, by means of Grover amplification, we obtain a state where the probabilityP of getting |χ iŝ with γ > 1. An estimation ofP with precision corresponds to an estimation of P with increased precision γ .
We define the Grover amplifier as usual where σ is the state |σ ≡ |1 a ⊗ |0 n I ⊗ |0 n J , and where Oā represents the operator which implements the algorithm followed by the shift (40). Note that the state |σ is the entry marked as 00 in the quantum matrix (58). We denote with R σ and R χ the reflection operators given respectively by If we define the angle θ, such that sin(θ) ≡ χ|Ψ , where |Ψ is the final quantum state, we have P = sin(θ) 2 and The corresponding amplification factor γ for the probability of measuring |χ is In order to maximize γ we need to choose k so that

Examples
In this section we give two explicit examples of manipulations for a given oracle.

Constant shift of a given oracle
Given an oracle O f which loads the function f , we are interested in providing an efficient implementation of an oracle Of for the shifted functionf where s is a generic real constant. Note that constructing the oracle forf is non-trivial whenever f is non-constant. 12 More specifically, consider the state which is the result of applying the oracle O f to the initial state |ψ 0 given in (59). Note that the state |φ 1 can be interpreted as having loaded the function f on the first row of the 2 × J quantum matrix. Following the steps described in Appendix A.2, we can load the constant function s into the second row of |φ 1 . Specifically, we load a constant array on the states associated to i = 1. It is important to remember that the quantum register |i n I stores the row address of the arrays, while |j n J stores the column address.
More explicitly, we obtain Following Section 3.2.1, we apply a Hadamard gate to the row qubit combining the two rows of the stored matrix. Namely Let us focus just on the components |0 a ⊗ |1 ⊗ |j n J , namely This means that we have stored the difference array f − s, i.e. the array f shifted by the constant s, in the second line of the matrix. Note that, at the same time, we have stored the sum array in the first line, 6 Discussion and conclusion The main goal of this work is to propose and describe a generic framework for the design of quantum algorithms based on direct embedding. Its modular structure, as depicted in Figure 2, is appealing and handy in a number of ways. For example, under this framework the main components of a quantum algorithm namely; data loading, arithmetic manipulations, and read-out can be studied and discussed separately. This holds true also for the considerations related to efficiency, whose current status is reflected into the color coding of Figure 2; specifically, an end-to-end efficient pipeline would be represented by a left-to-right path within the diagram that encounters only green boxes. Thus, the modular structure of the pipeline for the generic quantum algorithm helps to organise the research effort, compare and interpret different algorithms, and identify possible bottle-necks. Furthermore, it is possible to combine this framework with other existing routines. For instance, it is possible to adopt one's favourite amplitude amplification and estimation technique for the information-extraction part.
On a more technical level, the direct embedding of information into the quantum amplitudes avoids having to deal with square roots and thereby it opens the way to easier arithmetic manipulations of the data stored in the quantum state. In particular, we defined the quantum matrix, a two-dimensional array which can be thought of in analogy to a memory register: the basis states correspond to the row and column addresses of the memory locations, while the entries of the matrix are the quantum amplitudes representing the loaded information. As it has been previously illustrated, this construction allows for neat and flexible manipulation of arrays. We have also covered some basic arithmetic manipulations, for which we provided descriptions and implementation details. All in all, we set up a theoretical proposal for a package of arithmetic operations in a quantum framework. Its full potential and development requires further investigation and work, with particular focus on the loading and read-out modules.
Quantum matrices can be naturally generalized to multi-dimensional arrays. All the arithmetic manipulations proposed, as well as the loading and read-out techniques, can be extended in a straightforward way to the higher-dimensional and more general tensor setting. However, this comes at the cost of the potential necessity of additional controlled operations needed for "masking" the array and act only on a desired subset of entries. In other words, the cost of an operation is related to the co-dimension of the subset of entries to which it applies.
Finally, we also provided two specific example applications that are interesting on their own, beyond the discussions of the present work. Namely, the shift of a generic oracle by a constant and the shift by a step-wise approximate linear function. We note that their efficient implementation depends on the efficiency of the oracle to which the shift is applied. A constant shift for an oracle implements a vertical offset and it is useful -for example-in iterative algorithms where at each iteration an output oracle needs to be centered vertically, i.e. along the y axis.

A Details on data loading A.1 Pointwise loading of a matrix
In this subsection we show how to load a generic matrix (i.e. a two dimensional array) into a quantum matrix (1) in a pointwise fashion. We will be considering states of the form: which corresponds to the quantum matrix introduced in (1) with the addition of an auxiliary register | a . In what follows, let us assume for simplicity that the auxiliary register is one-dimensional, i.e. it consists of just one qubit. The other registers operate as described earlier when discussing Equation (1). The pipeline of a quantum algorithm starts by loading an initial state. For example, this can represent a probability distribution and the most simple such case is the uniform distribution. Let us consider it explicitly. To load the uniform distribution, we apply the Walsh-Hadamard gate 1 ⊗ H ⊗n I ⊗ H ⊗n J to the base state |0 a ⊗ |0 n I ⊗ |0 n J , thus obtaining: Note that the loading of the distribution has not made use of the auxiliary qubit. The next step in the pipeline is to load a real matrix f into the quantum matrix. To load a point f ij in the corresponding register we need to act in such a way that we only impact the targeted quantum state. For that purpose we need have to perform three steps: Figure 3: In yellow the row register. In red the column register.
• Mask the state. Masking the state consists in converting the state |0 a ⊗ |i n I ⊗ |j n J into the state |0 a ⊗ |I − 1 n I ⊗ |J − 1 n J ≡ |0 a ⊗ |11...11 n I ⊗ |11...11 n J . In terms of qubits, this requires to apply a NOT gate to all the qubits that are zero for the original state. The reason for this masking operation can be understood in the next step.
• Apply a suitable controlled y-rotation on the auxiliary qubit. The controls have to be applied on all the qubits except for the auxiliary one. Here we can see that the complexity of the algorithm depends on the number of qubits that we have to control. The angle for the rotation needs to be θ = arccos is the infinity norm of the matrix f . The factor f ∞ is needed to keep the amplitudes bounded so that the associated probabilities do not exceed 1.
• Undo step one. This consists in the application of the same mask already used at step one.
Following this strategy we can load each of the values f ij , thus getting the state: f ij |0 a ⊗ |i n I ⊗ |j n J + 1 − f 2 ij |1 a ⊗ |i n I ⊗ |j n J .
Usually we focus only on the states marked with |0 a , namely f ij |0 a ⊗ |i n I ⊗ |j n J .

A.2 Loading a constant array
Loading a constant array follows pretty much the same strategy as the pointwise loading. For the purpose of giving an explicit example, we are going to describe the loading of a constant array, taking a real value c ≤ 1, into a row of the quantum matrix. We start again loading a uniform distribution, thus obtaining (59). Then, we use a similar structure for loading the array as the one discussed before, namely • Mask the state. In this case we only need to mask the register corresponding to the row (the | n I register) and leave the column register untouched. As we mask only one register, we need fewer gates than for the pointwise loading described above.
• Apply a suitable controlled y-rotation on the auxiliary qubit. The angle for the rotation needs to be θ = arccos (c). The controls have to be made only in the row registers. Here we can see that the number of controls to load a constant array is drastically reduced with respect to the generic function.
• Undo step one, by applying the same mask already considered there.
If we were to load the constant array in the register |i n I we would get: c|0 a ⊗ |i n I ⊗ |j n J + 1 − c 2 |1 a ⊗ |i n I ⊗ |j n J + ... , where A is a normalization constant.
As it can be intuitively anticipated, the loading complexity grows together with the lack of symmetry of the loaded state. The constant case, being highly symmetric, is easy. In between the constant and the generic state with no symmetry, one can encounter lower degrees of symmetry, like for example functions which are piece-wise constant. We remind the reader that we adopted piece-wise constant functions in Subsection 5.2 to approximate a linear function and observed how the complexity grew with the approximation accuracy. For further discussions on the relation between the loading complexity and the symmetry of the loaded state we refer to [Bru20].