Pairing Optimization via Statistics: Algebraic Structure in Pairing Problems and Its Application to Performance Enhancement

Fully pairing all elements of a set while attempting to maximize the total benefit is a combinatorically difficult problem. Such pairing problems naturally appear in various situations in science, technology, economics, and other fields. In our previous study, we proposed an efficient method to infer the underlying compatibilities among the entities, under the constraint that only the total compatibility is observable. Furthermore, by transforming the pairing problem into a traveling salesman problem with a multi-layer architecture, a pairing optimization algorithm was successfully demonstrated to derive a high-total-compatibility pairing. However, there is substantial room for further performance enhancement by further exploiting the underlying mathematical properties. In this study, we prove the existence of algebraic structures in the pairing problem. We transform the initially estimated compatibility information into an equivalent form where the variance of the individual compatibilities is minimized. We then demonstrate that the total compatibility obtained when using the heuristic pairing algorithm on the transformed problem is significantly higher compared to the previous method. With this improved perspective on the pairing problem using fundamental mathematical properties, we can contribute to practical applications such as wireless communications beyond 5G, where efficient pairing is of critical importance. As the pairing problem is a special case of the maximum weighted matching problem, our findings may also have implications for other algorithms on fully connected graphs.


Introduction
The procedure of generating pairs of elements among all entries of a given system often arises in various situations in science, technology, and economy [1][2][3][4][5][6][7]. Here, we call such a process pairing, and the number of elements is considered to be an even number for simplicity. One immediately obvious problem is that the number of pairing configurations grows rapidly with the number of elements. The number of possible pairings is given by (n − 1)!!, where n indicates the number of elements in the system and !! is the double factorial operator. For example, when n is 100, the total number of possible pairings is on the order of 10 78 . Hence, finding the pairing that maximizes the benefit of the total system is difficult.
Notably, the pairing problem corresponds to the maximum weighted matching (MWM) problem on the complete graph. Multiple algorithms exist for solving the MWM problem [8][9][10][11][12][13][14][15]. In contrast to these conventional methods, we propose a heuristic and fast algorithm at the cost of some performance. The advantage of a fast heuristic algorithm is that it can be useful in environments where weights change dynamically or a quick pairing is required, such as in communications technology. A heuristic algorithm for the MWM problem using deep reinforcement learning was recently proposed by [16] with a similar goal. Furthermore, our research proposes algorithms that work under the limited observation constraint, which is explained later. In our previous study, we proposed an algorithm with a computational complexity of O(n 2 ) [17].
To the best of our knowledge, there is no exact algorithm that works on the order of O(n 2 ) for arbitrary weights. For example, Gabow [9] proposed a MWM algorithm with a computation time of |E||V| + |V| 2 log |V|, where V is a set of vertices and E is a set of edges. However, randomized or approximate algorithms can reduce computational time for some cases. For example, Cygan et al. [12] developed a randomized algorithm with a computation time of L|V| ω for graphs with integer weights (ω < 2.373 is the exponent of n × n matrix multiplication [18] and L is the maximum integer edge weight). Duan et al. [15] proposed an approximate algorithm achieving an approximation ratio of (1 − )M with a computation time of |E| −1 log −1 for arbitrary weights and |E| −1 log N for integer weights ( is a positive arbitrary value and M is the maximum possible weight matching value). Here, |V| = n, |E| = n(n − 1)/2. Here, we aim to improve our previous pairing problem result, i.e., to determine a higher-accuracy heuristic algorithm that works with O(n 2 ) computational complexity.
Note that the pairing problem should not be confused with the assignment problem, which is another special case of the MWM setting. The assignment problem requires the graph to be a weighted bipartite graph. Furthermore, in the assignment problem there are two classes of objects, where it is the goal to always match an object from the first class with an object from the second. However, in the pairing problem, there is only a single class of objects, and we allow any of them to be potentially paired with any other. The assignment problem is also related to the single-source shortest paths problem. Several well-known assignment algorithms [19][20][21] or single-source shortest paths algorithms [22] are known. For example, the Hungarian algorithm [19] solves the assignment problem O(n 3 ), the auction algorithm [20] works with parallelism and the Bellman-Ford algorithm runs with O(|V||E|) [22]. However, in this study, we consider a fully connected graph with an even number of elements, where the MWM problem cannot be solved by assignment problem algorithms.
An example of a pairing problem is found in a recent communication technology called non-orthogonal multiple access (NOMA) [23][24][25][26][27][28][29]. In NOMA, multiple terminals simultaneously share a common frequency band to improve the efficiency of frequency usage. The simultaneous use of the same frequency band causes interference in the signals from the base station to each terminal. To overcome this problem, NOMA uses a signal processing method called successive interference cancellation (SIC) [30] to distinguish individual channel information in the power domain, allowing multiple terminals to rely on the same frequency band. For simplicity, here we consider that the number of terminals that can share a frequency is given by two. Herein, the usefulness of the whole system can be measured by the total communication quality, such as high data throughput and low error rate, which depends crucially on the method of pairing.
The most fundamental parameter of the pairing problem is the merit between any two given elements, which we call individual compatibility, while the summation of compatibilities for a given pairing is called its total compatibility. The detailed definition is introduced below. Our goal is to derive pairings yielding high total compatibility.
In general, we do not need to assume that the individual compatibility of a pair is observable, i.e., only the total compatibility of a given pairing may be observed. Our previous study [17] divided the pairing problem into two phases. The first is the observation phase, where we observe total compatibilities for several pairings and estimate the individual compatibilities. The second is the combining phase, in which a search is performed for a pairing that provides high total compatibility. This procedure is referred to as pairing optimization. The search is based on the compatibility information obtained in the first phase. In [17], we show that the pairing optimization problem can be transformed into a travelling salesman problem (TSP) [31] with a three-layer structure, allowing us to benefit from a variety of known heuristics.
However, we consider that there is substantial room for further performance optimization. This study sheds new light on the pairing problem from two perspectives. The first is to clarify the algebraic structure of the pairing optimization problem. Because we care only about the total compatibility when all elements are paired, there are many compatibility matrices (defined in Section 2) that share the same total compatibilities. In other words, we can consider an equivalence class of compatibility matrices that yield the same total compatibilities and that cannot be distinguished if individual compatibilities are not measurable. We show that the compatibility matrices in each equivalence class have an invariant value.
Second, although any compatibility matrices in the same equivalence class theoretically provide the same total compatibility, the heuristic pairing optimization process can result in different total compatibility values. These differences are not caused by incomplete or noisy observations, but are due to the convergence properties of the heuristic pairing algorithms, which yield better results on some distributions than others. We examine how the statistics of the compatibility matrix affect the pairing optimization problem and propose a compatibility matrix that yields higher total compatibility after optimization. More specifically, we propose a transformation to the compatibility matrix that minimizes the variance of the elements therein, which we call the variance optimization. We confirmed numerically that enhanced total compatibility is achieved via the compatibility matrix after variance optimization. Furthermore, the proposed variance optimization algorithm may also be applicable when no observation phase is required, i.e., when the individual compatibilities are directly observable. In other words, there are cases where a compatibility matrix unsuitable for a heuristic combining algorithm can be converted to one that is easily combinable.
The remainder of this paper is organized as follows. In Section 2, we define the pairing optimization problem mathematically. Section 3 describes the mathematical properties of the equivalence class. Section 4 explains the concept of variance optimization and presents a solution by which it can be achieved. Section 5 presents results of numerical simulations of the proposed variance optimization. Furthermore, there are two optimization problems in this paper. The first is the pairing problem we aim to solve in Section 2.1. Second is the variance optimization which enables us to enhance the performance of the PNN+p2-opt algorithm in Section 4.2. Finally, Section 6 concludes the paper.

Problem Setting
In this section, we provide a mathematical definition of the pairing optimization problem that we address in this study, and define some of the mathematical symbols used in the following discussion. In addition, we explain the constraints applied to the pairing optimization problem.

Pairing Optimization Problem
Here, we assume that the number of elements is an even natural integer n, while the index of each element is a natural number between 1 and n. Parts of the pairing problem can be described elegantly in set theory, while others benefit from using matrix representations. We will use either, where appropriate. Here we use U(n) to denote the set of n elements: (1) Then, we define the set of all possible pairs for U(n) as P(n), which contains N(N − 1)/2 pairs: To describe the compatibilities of these pairs, we now define a "compatibility matrix" C as follows: The compatibility between elements i and j is denoted by C i,j ∈ R. The matrix C is always symmetric and the major diagonal is zero, because pairing i and j does not depend on the order of elements and an element cannot be paired with itself. The set of all possible compatibility matrices is denoted as Ω n when the number of elements is n. In other words, Ω n is the set of all n × n symmetric distance matrices, or symmetric hollow matrices. To describe a pairing, i.e., which elements are paired together, we now define a pairing matrix S ∈ R n×n : S is symmetric, because pairing element i with j is equivalent to pairing j with i. The pairing matrix S is also hollow, because pairing i with itself is not allowed. Each row and column contains only a single non-zero element, as each element i can only be paired once. Therefore, a pairing matrix S is an n × n symmetric and hollow permutation matrix. We define the set of all pairing matrices S(n) ≡ {S} when the number of elements is n: To derive the set representation of a pairing, we introduce the map f set as follows: A function denoted by X, C is then defined as follows, using the Frobenius inner product · F : For a given compatibility matrix C, we call S, C for S ∈ S(n) the "total compatibility" for pairing S. This formulation is equivalent to the one used in our previous work [17], and corresponds to summing the individual compatibilities C i,j of the pairs defined by S: For any given compatibility matrix C, the pairing optimization problem can then be formulated as follows: max: S, C , subject to: S ∈ S(n).

Limited Observation Constraint
As briefly mentioned in Section 1, in practice there may often exist one more constraint on the pairing optimization problem. We will assume that initially we do not know each compatibility value. Moreover, we assume that only the value of total compatibility S, C for any pairing S ∈ S(n) is observable. We call this condition the "limited observation constraint".
Under this constraint, we must execute two phases, the "observation phase" and the "combining phase", as introduced in our previous study [17]. First, we estimate the ground-truth compatibility matrix C g through observations of the total compatibilities of several pairings in the observation phase. We denote the estimated compatibility matrix by C e . Our previous work [17] calculated the minimum number of observations that are necessary for deducing C e and presents a simple algorithm for doing so efficiently.

Mathematical Properties of the Pairing Problem
In this section, we consider algebraic structures in the pairing problem. An equivalence relation is defined among compatibility matrices to construct equivalence classes. Then we show a conserved quantity within the equivalence class and that all members of the class yield the same total compatibility for any given pairing. Furthermore, the statistical properties of compatibility matrices are examined, forming the mathematical foundation of the variance optimization to be discussed in Section 4.

Adjacent Set
We define the adjacent set matrix R i (1 ≤ i ≤ n) as follows: We can also describe f set (R i ) as follows: With these adjacent sets, the following theorem holds.
Note that R n , C is not included, i.e., only n − 1 terms involving R i are needed. Here, we have chosen to exclude index n without loss of generality.
Proof of Theorem 1. Our strategy to prove this involves calculating the dimension of the involved subspaces. First, we prove the equation where O n denotes the n × n zero matrix. Then, we focus on the following equation to check linear independence. Here, we number all pairings such as S 1 , S 2 , · · · S u · · · S (N−1)!! . We introduce the coefficients a u and b v and calculate the overlap of the spans: We focus on the summation of the kth-column on both sides. Note that for every S u there is exactly one non-zero element in column k, while for R v there may be more than one if v = k and 1 ≤ k ≤ n − 1, or exactly one non-zero element otherwise. Then, the following equations hold: When k = n (because of our choice in formulating Theorem 1) With Equations (9) and (10), b k = 0 (1 ≤ k ≤ n − 1) holds. This means that By our previous study [17], Here, we denote L min (n) ≡ (n − 1)(n − 2)/2. By Equations (12) and (13), the following equation holds: Therefore, by Equations (11) and (14), The pairing matrices S are a subset of Ω n . In addition, the adjacent set matrices R i are also a subset of Ω n . Therefore, the following equation holds: With Equations (15) and (16), That is, {S} S∈S(n) plus {R i } 1≤i≤n−1 can construct Ω n . Finally, S, C is a linear transformation of S which comes from the property of the Frobenius inner product. Therefore, C ∈ Ω n can be constructed as a linear combination of { S, C | S ∈ S(n)} and Therefore, the theorem holds.

Corollary 1.
A, B ∈ Ω n , A = B if and only if ∀S ∈ S(n), This corollary is a special case of Theorem 1 because Equation (18) means that A and B have the same total compatibilities for all pairings and all adjacent sets.
Here, we present an example for Theorem 1 for the n = 4 case to illustrate the relationship of the involved subspaces. We define the following H i : We represent H i as follows, where D i,j ∈ Ω n is defined as the n × n matrix whose (i, j)th element is 1 and all other elements are 0: The image of these spaces is represented in Figure 1. That is,

Equivalence Class
We define the relation ∼ as follows: This represents an equivalence relationship between A and B, leading to the construction of an equivalence class. Regarding this equivalence class, the following theorem holds:

Theorem 2.
A, B ∈ Ω n , A ∼ B if and only if ∀{i, j} ∈ P(n), That is, for any matrix C in the equivalence class, the values given by the following are conserved.
∀{i, j} ∈ P(n), The matrix form of the conserved values is described in Appendix A.
Proof of Theorem 2. First, we prove sufficiency. We assume that the following equation holds: With Equation (27), the following equation holds: Here, the left side can be calculated as follows because the number of pairs including element k in P(n) is n − 1: Using Equation (29), Equation (28) is transformed into the following: Therefore, The following equation holds for any pairing S by Equation (27): Here, the following equation holds. Note that {i, j} belongs to f set (S); hence, R k , A appears only once and all indexes k ranging from 1 to n appear over the summation: For B, the following equation also holds: Using these transformations, Equation (32) is transformed as follows: With Equation (31), Then, A ∼ B holds. Second, we prove the necessity. We assume that A ∼ B holds. We define A * ∈ Ω n as follows: By Equations (33), (35) and (39), We derive the relationship between ∑ n i=1 R i , A and ∑ S∈S(n) S, A here in order to transform Equation (40). By Equation (34), For ∑ S∈S(n) S, A , we focus on the fact that the number of appearances of A i,j is (n − 3)!!, With Equations (41) and (42), the following relationship holds: Therefore, the following holds by A ∼ B and Equation (43): By Equation (44), we can cancel the second and third terms of (40), In addition, A ∼ B holds. Therefore, Additionally, the following also holds by A ∼ B and Equation (44): By Equation (47), Therefore, by Equations (46) and (48) and Corollary 1, is valid. That is to say, the following equation holds:

Mean and Covariance
Here, we analyze statistical properties associated with the compatibility matrix and the total compatibility.
We define the mean values of compatibilities and total compatibilities as where µ element (C) indicates the mean value of the elements of the compatibility matrix C and µ sum (C) is the mean of the total compatibility across all possible pairing with respect to the compatibility matrix C. We define the square root of the covariance values for compatibilities and total compatibilities as follows: Clearly, σ 2 element (C, C) and σ 2 sum (C, C) are variance values for compatibilities and total compatibilities when the compatibility matrix is C.
Theorem 3. Let I n be the n × n identity matrix, J n the n × n matrix where all elements are 1, and C ∈ Ω n ,Ĉ ≡ C − µ element (C)(J n − I n ). Then, the following equation holds: (53) Proof of Theorem 3. By definition, Using Equation (51), Here, the following equation holds: Therefore, by Equations (54) and (55), Then, using this formula, By Equations (56) and (58), the following equation holds: Therefore, the theorem holds.

Variance Optimization
This section examines the performance enhancement from deriving a pairing that yields higher total compatibility by exploiting the algebraic structures identified in the previous section. We first show that the variance of the elements in a compatibility matrix affects the performance of the heuristic algorithm proposed in our previous study. Then we propose the transformation of a compatibility matrix to another one that minimizes the variance while ensuring that the total compatibility is maintained.

Performance Degradation through the Observation Phase
In our previous study [17], we proposed an algorithm for recognizing the compatibilities among elements through multiple measurements of total compatibility. To summarize, we estimate the compatibility matrix denoted byC ∈ Ω n , which is given by ThisC ∈ Ω n is one of the elements in the equivalence class. That is, C ∼C holds. By this property and Equation (60), the dimension of {S} S∈S(n) is given by (n − 1)(n − 2)/2, which we refer to as L min (n). This means that the number of observations required to grasp the compatibilities through an observation phase is L min (n). Indeed, our previous study proposed an observation algorithm which needs O(n 2 ) measurements. We have also confirmed numerically that the observation strategy provides a compatibility matrix, which is in the equivalence class of the ground-truth compatibility matrix C g . In the numerical studies, the elements of the ground-truth compatibility matrix, C g i,j , were specified by uniformly distributed random numbers in the range of [0, 1].
However, finding a pairing yielding a greater total compatibility becomes difficult based on C e , including the above-mentionedC, even though C e is in the equivalence class where the ground-truth compatibility C g is included. In searching for a better pairing, we use a heuristic algorithm, which is named Pairing-2-opt [17]. We consider the difficulty comes from the fact that the variance of the elements of the compatibility matrix σ 2 element (C e , C e ) would be larger than those of σ 2 element (C g , C g ), which is highly likely to cause the combining algorithm to become stuck in a local minimum.
Hence, our idea is to find a compatibility matrix X which is in the same equivalence class of matrix C ∀S ∈ S(n), S, X = S, C while simultaneously minimizing the variance of the elements of σ 2 element (X, X).

Transforming the Compatibility Matrix with Minimized Variance
We solve the following optimization problem: min : σ 2 element (X, X), subject to : X, C ∈ Ω n , C is fixed, By Theorem 3 and σ 2 sum (X, X) = σ 2 sum (C, C), we transform this problem into the following form: min : The optimal solution for this problem holds because the sum of squares is minimized when all values are 0: Hence, the following equation is derived: By Equation (65) and Theorem 2, the optimal solution is represented as follows: Thus, the compatibility matrix with minimal variance is derived. In addition, this discussion and solution mean that the optimal-variance solution is unique with respect to the equivalence class.

Simulation
In this section, we evaluate the performance of the proposed method on the pairing optimization problem. There are two important points that should be clarified through the simulations. One is to quantitatively evaluate the performance reduction of the combining algorithm proposed in the previous study, based on the observation phase. The other is to demonstrate the performance enhancement due to the variance optimization discussed in Section 4.

Setting
We configure the ground-truth compatibility matrix C g ∈ Ω n with two different distributions. The first is the uniform distribution: Here, we denote the uniform distribution between 0 and 1 as U(0, 1). The second distribution is the Poisson distribution: Here, we denote the Poisson distribution whose mean is λ as Poisson(λ). In the numerical simulation, the number of elements in the system n varied from 100 to 1000 in intervals of 100. For each n, we conducted 100 trials with different randomly generated ground-truth compatibility matrices C g based on Equation (67) or Equation (68). We quantified the performance for each derived pairing S ∈ S(n) by 2 S, C g /n and evaluated its average over 100 trials for each value of n.

Simulation Flow
The ground-truth compatibility matrix C g is transformed into C e 1 by the observation algorithm based on Equation (60). The variance optimization transforms C e 1 into C e 2 . The combining algorithm, which is called PNN+p2-opt [17], yields a pairing with the intention of achieving higher total compatibility. The exchange limit l is an internal parameter in PNN+p2-opt. This determines the number of maximum trials, and is set to 600 in the present study.
We evaluated the performance on the basis of C g , C e 1 , and C e 2 , as shown in flows (i), (ii), and (iii), respectively, in Figure 2. . Schematic illustration of the three heuristic pairing optimization algorithms tested in the simulation. Case (i) (blue) applies the combining algorithm directly to the ground-truth compatibility matrix C g . Case (ii) (red) first applies the observation algorithm to obtain an estimated compatibility matrix C e 1 , followed by the combining algorithm. Case (iii) (yellow) first estimates the compatibility from observation (C e 1 ), followed by the variance optimization (C e 2 ), and then executes the combining algorithm.

Performance
The blue, red, and yellow curves in Figure 3 demonstrate the performance of cases (i), (ii), and (iii), respectively, as a function of the number of elements for the uniform distribution ( Figure 3a) and the Poisson distribution (Figure 3b). For the uniform distributed ground-truth we observe that the performance of case (ii) is inferior to that of case (i), demonstrating the performance degradation by the transformation from C g to C e 1 through observation. Furthermore, the performance of case (iii) is enhanced compared with that of case (ii), which confirms the performance gain from variance optimization. The results differ for the Poisson distribution. Here, the performance of case (iii) is higher than case (i). That is, for the Poisson case the variance optimization (Flow (iii)) not only counteracted the performance loss of the observation algorithm (Flow (ii)), but actually enhanced the performance compared to the ground truth matrix C g (Flow (i)). Further numerical tests revealed that the relationship of performances for a Gaussian distribution are similar to those for the uniform distribution. Conversely, the performance for a binary distribution hardly differed between any of the algorithms. The variance of C g , C e 1 , and C e 2 are evaluated as shown in Figure 4 as a function of the number of elements. We clearly observe that the variance of C e 1 is higher than C g while the variance of C e 2 becomes comparable to the ground-truth case C g for both the uniform and Poisson distributions.
From these numerical results, we can conclude that the variance optimization minimizes the variance and enhances the performance of the achieved total compatibility. It is worth noting that the performance with the uniform distribution after variance optimization is still lower than the case based on the ground-truth matrix C g , as observed in Figure 3a. This occurs because the variance optimization algorithm does not transform C e 1 to the original compatibility matrix C g . In other words, there exist additional factors that influence the performance of the combining algorithm that are related to the compatibility distribution. The distribution of the original compatibility C g (uniform distribution) is seemingly beneficial for the performance of the heuristic combining algorithm, even when compared to the compatibility matrix with minimum variance C e 2 .

Conclusions
One of the most challenging issues in the pairing problem is how to understand the underlying compatibilities among the elements under study. An accurate and efficient approach is essential for practical applications such as wireless communications and online social networks. This study reveals several algebraic structures in the pairing optimization problem.
We introduce an equivalence class in the compatibility matrices, containing matrices that yield the same total compatibility although the matrices themselves differ. This can also be expressed through a conserved value or invariance in the equivalence class. Based on such insights, we propose a transformation of the initially estimated compatibility matrix to another form that minimizes the variance of the elements. We demonstrate that the highest total compatibility found heuristically is improved significantly with the proposed transformation relative to the direct approach.
In the future, the proposed algorithm may be applied to bipartite matching and assignment problems, for example. Therefore, if the compatibility between elements that should not be paired is set to a negative value with a relatively large absolute value, we may solve the problem heuristically. Hence, the variance optimization proposed in this study may aid in performance enhancement.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author upon reasonable request.  Figure A1 shows that, as expected, the PNN+p2-opt algorithm is significantly faster than the conventional algorithm, and Flow (i), Flow (ii), Flow (iii) work faster in this order. These computational times can be explained as follows: First, PNN+p2-opt is heuristic and a O(n 2 ) algorithm. Therefore, PNN+p2-opt is significantly faster than the conventional MWM algorithm that aims to find the absolute best solution. Second, we count the computational time, including the variance optimization procedure. The variance optimization takes some time, so the computational time of flow (ii) and flow (iii) is longer than flow (i). Third, flow (ii) has a tendency to become stuck in local minima, resulting in less computational time than flow (iii), due to the faster termination of the p2-opt algorithm.
In the future, the comparison to machine-learning-based methods such as the one proposed in Ref. [16] is of great interest. However, at this point, it is unclear how to conduct a fair comparison, as the ML-based algorithm requires extensive training on multiple examples before it is able to solve the problem. Nevertheless, as machine learning is a rapidly evolving field, it is possible that ML-based algorithms specialized for the pairing problem could be developed in the near future.