1. Introduction
One of the most fundamental pillars of modern life is telecommunication in general, and wireless telecommunication networks in particular. These serve literally billions of requests every week, and not only for phones, but also devices from the Internet of Things (IoT). In this paper, we refer to all these different units as users.
The first wireless networks were constructed around 1970, and were envisaged to be built from small
cells, each cell served by one
base station (
BS for short), so the network was decomposed into smaller parts that served the users independently from each other. (At this time the users were bulky
mobile phones; see, for example [
1]). While this idea is simple, the design is afflicted by the wellknown
celledge problem: users located in the overlapping area of two cells suffer from strong interference from the neighboring BSs. This problem has created a considerable challenge for service providers.
In the current coordinated multipoint (CoMP) transmission technology, several (typically physically close) base stations are organized, permanently or dynamically, into one cell, and the BSs in the same cell jointly serve all the users in this cell. The goal of this approach is again to decompose a largescale network into smaller parts: the constructed clusters (again called cells) have less engineering complexity than the original network and they can operate in parallel.
Currently, considering the enormous number of base stations and users, and in spite of the multicell approach, the celledge problem stubbornly remains with us (see [
2]). For 5G networks, the celledge problem can become even more pronounced because more users fall into the celledge area as the size of each cell shrinks (see [
3]).
Mathematically speaking, minimizing the celledge problem belongs to the family of clustering problems: given a set of (not necessarily) homogeneous objects, we want to divide the objects into pairwise disjoint classes while optimizing some “measure” of the resulting set of classes.
Myriads of theoretical and practical problems belong to the clustering framework. They come from classical combinatorial optimization problems to printed circuit board designs, from VLSI CAD applications to distributing tasks among processors for supercomputing processes, and from pattern recognition to computer vision problems and image retrieval. The computational complexity of these problems varies from easy to very hard. For example, the minimum number of edges whose deletion places two predefined vertices into separate components in a simple graph (
minimum cut problem [
4]) can be computed in polynomial time. The situation changes dramatically if we want to separate three predefined vertices (
the multiway cut problem). In the general case, this generalization of the problem was shown to be NPhard [
5], while the problem becomes fixed parameter tractable if the input is restricted to planar graphs [
5,
6].
Probably, the very first engineering problem of a clustering nature was the following: we want to place a complicated electronic design on printed circuit boards, where each board can contain at most
k components and where the electronic connections among the boards are expensive compared to connections inside a board ([
7], see also [
8]). It can be seen that in such a realworld problem, an upper bound on the possible size of the clusters is given too. While the notion of NPcompleteness was being developed at the time, the authors correctly placed the problem into the NPhard class.
The majority of the clustering problems are NPhard, so there is no chance to solve them exactly. Sometimes there are known performance guarantees on the solution; for example, in [
5], there is a polynomial time algorithm with a
$2(11/k)$ approximation ratio for the multiwaycut problem, while in [
9], a polynomial time
$(1.51/k)$ approximation algorithm was developed for the same problem.
The interference minimization problem in wireless networks is a known NPhard problem, and so far a constant factor approximation algorithm has not been found. In the literature, there are several approaches to solve this problem, see [
1,
10,
11,
12,
13,
14]. Because of the intractability of the problem, all of these approaches are heuristic in nature. In practice, these methods still do not satisfactorily solve the celledge problem [
2].
Essentially, all of these method use one of the general clustering methods: the
kernel kmeans or the
spectral clustering method. The former method was developed in [
15,
16]. However, as was proved in [
17], the two approaches are essentially equivalent with each other (for a survey on these methods, see [
18]). Consequently, we compare our algorithm to the spectral clustering method (as it is used in [
3]). This approach attacks this clustering problem as an undivided one, partitioning the base stations and users simultaneously. However, the two sets of agents typically have different cardinalities, and their elements have very different functions and properties in the network.
In this paper, we propose a simple and fast clustering algorithm to deal with the celledge problem. (An earlier version of this algorithm was reported in [
19]). Our algorithm runs significantly faster than the spectral clustering method, and simulations on synthetic instances indicate that our proposed method typically provides higher quality solutions than the application of the spectral clustering method in [
3]. In contrast with the spectral clustering methods, our proposed heuristic method is deterministic.
We divide the interference minimization problem into three subproblems. In the first phase, a new, socalled dotproduct similarity measure is introduced on pairs of base stations. This similarity measure is based on the dynamically changing signal strengths between the users and the base stations. The two subsequent phases are two clustering problems. The second phase partitions the base stations into clusters, and the third phase assigns the users to base station clusters. The solution to the whole problem is given as a pairing between the base station clusters and user clusters.
Our reason for this handling is the following observation: the roles of the base stations and the users are different, and so they require different considerations. We emphasize this asymmetry with our notation system as well. The clusters of the entire system appear as pairs of clusters: one on the index set of base stations, and one on the index set of the users; cluster classes of the same subscript serve together as a cluster class of the entire system.
The novelty of our method, which is responsible for the superior performance, lies in the usage of the dynamical similarity function between the base stations. The second and third phases can use offtheshelf clustering algorithms, which can allow possible further finetuning of our method.
2. The Total Interference Minimization Problem
The formulation of the
total interference minimization problem that we use in this paper was proposed and studied by Dai and Bai in their influential paper [
3]. We give our description of their formulation in the following paragraphs.
There is a collection $B:=\{{b}_{i}:i\in I\}$ of distinct base stations and there is a collection $U:=\{{u}_{j}:j\in J\}$ of distinct users, where $b=\leftB\right$ and $u=\leftU\right$, and the base stations and the users are indexed by the sets of the first b and u natural numbers, respectively. As we mentioned earlier, the users can be mobile phones, but can also be devices of the IoT. Therefore, their numbers altogether can be rather large compared to the number of base stations. (However, in future 6G networks, the ratio of these numbers may change considerably. We do not consider this case here.)
The model depicts the network with a bipartite graph: one class contains the BSs, while the other class consists of the users. Let $G=(V;E):=(B\cup U;E)$ where E consists of ordered pairs of form $({b}_{i},{u}_{j})$. We use the shorthand $(i,j)$ as well.
We define a weight function $w:B\times U\to {\mathbb{R}}_{0}^{+}$, where each ${w}_{i,j}$ is a positive real number if and only if $(i,j)$ is an edge in G, otherwise we set ${w}_{i,j}=0$. The weight of an edge of the bipartite graph represents the signal strength between its endpoints (one BS and one user).
Let M be a positive integer, and let $\left[M\right]$ be the set of the integers from 1 to M. Let $\mathcal{I}=({I}_{1},\dots ,{I}_{M})$ and $\mathcal{J}=({J}_{1},\dots ,{J}_{M})$ be partitions of I and J, respectively. Finally, let $\mathcal{P}$ be the set of partition pairs: $\mathcal{P}:=\{{P}_{\mathcal{l}}=({I}_{\mathcal{l}},{J}_{\mathcal{l}}):\mathcal{l}\in \left[M\right]\}$.
Let us define the following quantities: for each integer
$\mathcal{l}\in \left[M\right]$, let
In graph theoretical terms, the first quantity is the
weight of the partition class, while the second one is the
cut value of the partition class.
Definition 1 (IFcluster system)
. A partition $\mathcal{P}$ is an IFcluster system (or IFcluster for short; IF abbreviates interference), if
 (i)
there is no partition class ${P}_{\mathcal{l}}$ such that ${I}_{\mathcal{l}}=\varnothing $, and
 (ii)
for any user, there is a base station in its cluster to which it is joined by an edge in G.
The total interference
(see [3]) of a given $\mathcal{P}$ IFcluster system is defined aswhere the star superscript denotes that if a partition class ${J}_{k}$ is empty, then the index ℓ skips k, so that the formula in (2) is welldefined. Condition (ii) above covers the requirement that all users must be served in a solution. The omission of a partition class in the sum (
2) is due to the technical ability that some base stations that do not serve any users can be switched off temporarily.
Engineering complexity. Typically, in clustering problems, the smaller the number of clusters, the more “efficient” the best available solution is with the given number of clusters. In our problem, this is indeed the case: Dai and Bai showed (see [
3], Theorem 1) that the optimum value of the total interference monotone increases as the number of clusters increases. It follows that if the number of clusters is not restricted, then the best solution is to consider one giant cluster containing every base station and user.
However, the engineering necessity of managing the coordination between base stations of a large cluster is usually quite prohibitive and depends heavily on the boundary conditions. In this paper, the number of clusters is treated as part of the input.
The main result of this paper is a new and fast heuristic algorithm (the Dotproduct clustering algorithm) for the following problem.
Problem 1 (Total interference minimization problem)
. Find an IFcluster system $\mathcal{P}$ that minimizes (2). As we mentioned earlier, the majority of the clustering problems in general, and the total interference minimization problem in particular, are NPhard. However, our Problem 1 lives on bipartite graphs, so the complexity results on general clustering problems do not apply to it automatically. However, incidentally, Problem 1 is also NPhard. There are a vast number of graph partitioning problems that are similarly naturally defined on bipartite graphs. For example, the typical machine learning and data mining applications, such as product recommendation in ecommerce, topic modeling in natural language processing, etc., are all naturally represented on bipartite graphs.
Thus, it is not surprising that already in 2001, in [
20], the spectral graph partitioning algorithm was used to cocluster documents and words (in bipartite graphs). From that time on, the spectral clustering method has also often been applied to solve other bipartite partitioning problems.
As we mentioned earlier, the model above was introduced by Dai and Bai [
3]. Their approach was static: they evaluated the input, then they clustered the base stations and the users simultaneously to minimize the total interference. For that end, the spectral clustering method was applied to construct the cluster system. The developed method solves a relaxed quadratic programming problem and constructs the clusters by discretizing the continuous solution. If a derived solution contains a partition class without base stations, then the solution is dismissed. This approach is static since it does not provide an efficient method to deal with small, dynamic changes as time passes.
However, this static approach leaves much to be desired since we should consider some additional objectives: for initialization of the base station/user clustering in our wireless communication network, we want a fast, centralized algorithm, as our proposed algorithm for clustering for the total interference minimization problem is. Furthermore, during the routine operation of the network, dynamic changes may occur: some users may move away from the BSs of a given cluster, some may finish calls, while others (currently not represented in the bipartite graph) may initiate calls. While these changes can be managed in a centralized fashion, this would not be practical. Instead, we need an incremental algorithm that is able to adaptively change the edge weights and/or can update the actual vertices, and can manage the reclustering of the affected users. It is propitious to manage these local changes distributively by the users. Finally, every few seconds, it is useful to run the centrally managed algorithm again to find a new clustering solution. Since the proposed algorithm has a low complexity, this approach is clearly beneficial. We return to this question at the end of
Section 4.1.
Our proposed
DPSimilarity Clustering algorithm can handle all these issues as well. As the simulations in
Section 4 show, it is fast and provides highquality solutions, compared to the spectral clustering algorithm.
3. DotProduct Clustering Algorithm for Total Interference Minimization Problem
In this section, we describe our new and simple heuristic algorithm for the total interference minimization problem. As we already mentioned, our algorithm runs in three phases. In the first, a similarity function is introduced. This phase contains the novelty of our approach. Our similarity measure depends on the relations between the base stations and the users.
In the second phase, the base stations are clustered on the basis of the similarity measure. Here we have significant freedom to choose our clustering algorithm. The simplest possible method is arguably a hierarchical clustering method. For simplicity we use such a method in this paper, but this choice may badly affect the stability of the solutions. It may provide unbalanced cluster sizes, and it can also introduce too much engineering complication. It is possible that some backstep or averaging approach can amend the variance in the quality of the solutions.
Finally, the third phase assigns users to the base station clusters. By design, the output of our algorithm is always an IFcluster system. However, it would be beneficial to study methods to balance the number of users in the cluster classes.
3.1. Phase 1: The DotProduct Similarity Measure
A superficial study of Equation (
2) says that we want to decompose the graph in such a way that clusters contain
heavy (high weight) edges, and the cuts among the clusters consist of
light edges. The weight function is described via the matrix
W where the rows correspond to the BSs, and the columns correspond to the users:
Let
${w}_{i,\u2022}$ denote the row vector of
${b}_{i}$ (the
ith BS), and let
${w}_{\u2022,j}$ denote the column vector of
${u}_{j}$ (the
jth user). Hence, we can write
$W={\left[{w}_{i,\u2022}\right]}_{i\in \mathcal{I}}={\left[{w}_{\u2022,j}\right]}_{j\in \mathcal{J}}$. Our heuristic is that the higher the correlation between the weight distribution of two BSs, the more beneficial it is to include them in the same cluster. We define the
similarity function as
among the BSs, where
$\parallel \xb7\parallel $ is the Euclidean norm. The name
dotproduct is in reference to the enumerator of Equation (
3). The similarity
$\rho $ depends only on the relations between the users and the BSs. We envisage that the larger the value of
$\rho $, the greater the similarity between the BSs.
In the total interference minimization model, an ensemble of BSs in a cluster behave as one base station. Indeed, if the IFcluster system
$\{{P}_{1},\dots ,{P}_{M}\}$ minimizes Equation (
2), then replacing the ensemble of BSs in cluster
${I}_{\mathcal{l}}$ with just one new BS
${b}_{\mathrm{new}}$, whose weight to user
j is
${\sum}_{k\in {I}_{\mathcal{l}}}{w}_{k,j}$, preserves the optimum, and the total interference metric takes this optimum on the partition pair
${\mathcal{I}}^{\prime},\mathcal{J}$ where the
ℓth class
${I}_{\mathcal{l}}$ is replaced with the index of
${b}_{\mathrm{new}}$ in
${\mathcal{I}}^{\prime}$. Let us define
as the sum of the signal strength vectors of the base stations in
${B}_{k}$. The similarity function
$\rho $ can be naturally extended to ensembles of BSs:
3.2. Phase 2: Defining BS Clusters
As we discussed earlier, we have great freedom to determine the BS clusters. However, for simplicity, here we apply a
hierarchical clustering algorithm: we call it
DPHclustering, short for
dotproduct hierarchical clustering. The fixed integer
M, which is the size of the cluster system, is part of the input. At the beginning, we assign a cluster to each BS containing it. Then we recursively merge the two clusters that have the highest similarity
$\rho $ between them until the prescribed number of clusters is reached. As we will soon see, this works reasonably well. Here we want to draw attention to the fact that using normalization in Equations (
3) and (
5) is a natural idea.
We start with the finest partition: let
${\mathcal{I}}_{0}$ consist of the individual clusters for each BS in
I (thus
$\leftI\right={\mathcal{I}}_{0}$). We combine two clusters in each of the
$\leftI\rightM$ rounds iteratively to derive a sequence of partitions
${\mathcal{I}}_{0},{\mathcal{I}}_{1},\dots ,{\mathcal{I}}_{\leftI\rightM}$ of
I, where
${\mathcal{I}}_{r}=Ir$.
${\mathcal{I}}_{r}$ is obtained from
${\mathcal{I}}_{r1}$ by combining the two clusters of
${\mathcal{I}}_{r1}$ with the largest similarity
$\rho $ between them as defined by Equation (
5).
In Algorithm 1, we maintain the similarity measure
$\rho ({I}_{k},{I}_{m})$ for every pair of clusters
${I}_{k},{I}_{m}\in {\mathcal{I}}_{r}$ as follows. Let us define the symmetric function
$\mathit{dot}$ for every
$k=1,\dots ,M$ as
If
$\mathit{dot}$ is already computed for every pair in
${\mathcal{I}}_{r}\times {\mathcal{I}}_{r}$, then the similarity measure
$\rho $ can be computed via three scalar operations for any pair of clusters in
${\mathcal{I}}_{r}\times {\mathcal{I}}_{r}$ since
Algorithm 1 Hierarchical clustering based on the similarity function $\rho $ 
function DPHclustering($I,W,M$) ${\mathcal{I}}_{0}\leftarrow \left(\genfrac{}{}{0pt}{}{I}{1}\right)$ $\mathit{dot}\leftarrow W\xb7{W}^{T}$ ▹ matrix multiplication for $r=0$ to $\leftI\rightM1$ do $\{{I}^{\prime},{I}^{\u2033}\}\leftarrow \underset{\{{I}_{k},{I}_{m}\}\in \left(\genfrac{}{}{0pt}{}{{\mathcal{I}}_{r}}{2}\right)}{arg\; max}\rho ({I}_{k},{I}_{m})$ ▹ ( 7) ${\mathcal{I}}_{r+1}={\mathcal{I}}_{r}\{{I}^{\prime},{I}^{\u2033}\}+\{{I}^{\prime}\cup {I}^{\u2033}\}$ update $\mathit{dot}$ ▹$\mathcal{O}\left(b\right)$ scalar operations end for return ${\mathcal{I}}_{\leftI\rightM}$ end function

Lemma 1. Let the time complexity of the matrix multiplication $W\xb7{W}^{T}$ be $f\left(W\right)$. Using a maxheap data structure, DPHclustering runs in $f\left(W\right)+\mathcal{O}\left(\right(bM)blogb)$ time.
Proof. It is easy to see that the running time of Algorithm 1 is in the range of $f\left(W\right)+\mathcal{O}\left({b}^{2}u+(bM)\xb7{b}^{2}\right)$ because when two clusters are combined, $\mathit{dot}$ can be updated by summing the corresponding two rows and two columns. Moreover, we can store the $\rho $values of pairs in ${\mathcal{I}}_{r}\times {\mathcal{I}}_{r}$ in a maxheap: when two clusters are merged, at most $2b$ values need to be removed and at most b new values need to be inserted into the heap that contains the at most $\left(\genfrac{}{}{0pt}{}{b}{2}\right)$ elements of the set $\left\{\rho ({I}_{k},{I}_{m})\phantom{\rule{4pt}{0ex}}\right\phantom{\rule{4pt}{0ex}}\{{I}_{k},{I}_{m}\}\in \left(\genfrac{}{}{0pt}{}{{\mathcal{I}}_{r}}{2}\right)\}$. With these optimizations, the forloop takes at most $\mathcal{O}\left((bM)\xb7blogb\right)$ steps. □
In theory, the running time of Algorithm 1 is dominated by the matrix multiplication $W\xb7{W}^{T}$, unless the magnitude of b and u are of different orders. There are many techniques to accelerate the multiplication of matrices, which we do not discuss here. In practice, a plethora of efficient offtheshelf implementations that take advantage of hardwareaccelerated vector instruction are available, so much so that $f\left(W\right)$ can be contained in $\mathcal{O}\left(\right(bM)blogb)$ for some combination of the CPU and practical instances of the problem.
3.3. Phase 3: Assigning Users to BS Clusters
Let $\mathcal{I}:={\mathcal{I}}_{bM}=\{{I}_{1},\dots ,{I}_{M}\}$ be the final partition produced by the hierarchical clustering on the primary class. We are looking for the desired final graph partition in the form of $\mathcal{P}=\left\{({I}_{k},{J}_{k})\phantom{\rule{4pt}{0ex}}\right\phantom{\rule{4pt}{0ex}}k\in \left[M\right]\}$, so it only remains to find an appropriate clustering $\mathcal{J}=\{{J}_{1},\dots ,{J}_{M}\}$ of J.
We assign each user
$j\in J$ to the cluster
${J}_{\mathcal{l}}$ where
ℓ is defined by
The choice described by Equation (
8) is easy to compute. A highlevel pseudocode can be found in Algorithm 2. If no element of
U is isolated in
G, then Algorithm 2 ensures that there are no isolated users in the subgraph induced by the base station of
${I}_{k}$ and users
${J}_{k}$. However, this is not necessarily the case for elements of
${I}_{k}$. However, the definition of
${\sum}^{*}$ in Equation (
2) takes care of this. This completes the description of our algorithm.
Algorithm 2 Dotproduct hierarchical clustering on I then assigning each element of J to the best cluster. 
function DPSimilarity Clustering($G,W,M$) $I,J\leftarrow $ index sets of B and U $\{{I}_{1},\dots ,{I}_{M}\}\leftarrow $ DPHclustering(I, W, M) ${J}_{1},\dots ,{J}_{M}\leftarrow $ empty clusters for all $j\in J$ do $\mathcal{l}\leftarrow {arg\; max}_{k\in \left[M\right]}{\sum}_{i\in {I}_{k}}{w}_{i,j}$ add j to ${J}_{\mathcal{l}}$ end for return $\left\{({I}_{k},{J}_{k})\phantom{\rule{4pt}{0ex}}\right\phantom{\rule{4pt}{0ex}}{J}_{k}\ne \varnothing \}$ end function

Lemma 2. Let the time complexity of the matrix multiplication $W\xb7{W}^{T}$ be $f\left(W\right)$. Then DPSimilairtyClustering runs in $f\left(W\right)+\mathcal{O}(b\xb7((bM)logb+u\left)\right)$ time.
Proof. Follows from Lemma 1 and a cursory analysis of Algorithm 2. □
3.4. Engineering Complexity (Revisited)
In this subsection, we discuss, in short, some practical considerations in reallife applications. One of the most important ones is that the clusters cannot be arbitrarily complex (from an engineering point of view) because of the computational overhead of the synchronization of too many base stations. Therefore, it may be necessary to consider an upper bound T on the possible numbers of the BSs in any cluster.
The number of clusters is part of the input of the clustering method. One can ask whether there is a way to optimize
M. A heuristic attempt is given in [
3], using a binary search wrapper over the spectral clustering method to determine an optimal
M. The idea is based on a theorem of Dai and Bai that the optimal solution is monotone increasing in
M; however, no such guarantee is given for the approximate solution found by the heuristic algorithms constructing the
Mpart clusters. For this reason, we do not consider this alternate optimization problem.
4. Experiments
We compared the performance of
DPSimilarity Clustering (Algorithm 2) and
Spectral Clustering [
3] in several scenarios.
We studied and compared our algorithm to a representative of the existing algorithms in two main scenarios that model different distributions of the users. In both scenarios, users are randomly and uniformly distributed, and independently placed into
${[0,1000]}^{2}$ or
${[0,3000]}^{2}$ (squares with an area of
$1\phantom{\rule{0.166667em}{0ex}}{\mathrm{km}}^{2}$ and
$9\phantom{\rule{0.166667em}{0ex}}{\mathrm{km}}^{2}$, respectively). The weight (or signal strength) between a base station (BS) and a user
${b}_{i},{u}_{j}\in {\mathbb{R}}^{2}$ is determined by
where
${\mathrm{dist}}_{min}=1$,
${\mathrm{dist}}_{max}=200$, and the
path attenuation (path loss) exponent $\alpha $ is set according to the modeled scenario (the value of
$\alpha $ depends heavily on the environment; for further details, see ([
1], Section 2)). In other words, a user cannot connect to a base station that is farther away than
${\mathrm{dist}}_{max}$. We assume that in realworld applications, the signal strength values are readily available.
Scenario 1 (urban environment): the first scenario models a city center as a space
${[0,1000]}^{2}$, where base stations are also randomly and uniformly distributed, and independently placed (see
Figure 1a); we assume that the path loss attenuation factor is
$\alpha =2$ as suggested by ([
1], Section 2).
Scenario 2 (rural environment): the second scenario models an unevenly occupied rural environment, where the placement of base stations is less restricted: the BSs are placed into
${[0,3000]}^{2}$ with density function proportional to the sum of the signal strengths of the users, where the path loss attenuation factor is
$\alpha =3$, again, as suggested by ([
1], Section 2).
Figure 1.
Scenario 1: A demonstration of the output of the two clustering algorithms on 100 BSs and 200 users. Clustering solutions shown for
$M=20$ (so that the clusters are fairly large and visible). Triangles and circles represent BSs and users, respectively. (
a) A total of 100 BSs and 200 users, randomly and uniformly distributed. (
b) Comparison of the
$\mathtt{tinf}$ values of the solution provided by the two algorithms as
M increases;
Spectral Clustering failed for
$M=27,30$ because it assigned a couple of users to a cluster without a base station. (
c) DPSimilarity Clustering into
$M=20$ clusters, Algorithm 2;
$\mathtt{tinf}\approx 5.04$. (
d) Spectral Clustering into
$M=20$ clusters, see [
3];
$\mathtt{tinf}\approx 7.12$.
Figure 1.
Scenario 1: A demonstration of the output of the two clustering algorithms on 100 BSs and 200 users. Clustering solutions shown for
$M=20$ (so that the clusters are fairly large and visible). Triangles and circles represent BSs and users, respectively. (
a) A total of 100 BSs and 200 users, randomly and uniformly distributed. (
b) Comparison of the
$\mathtt{tinf}$ values of the solution provided by the two algorithms as
M increases;
Spectral Clustering failed for
$M=27,30$ because it assigned a couple of users to a cluster without a base station. (
c) DPSimilarity Clustering into
$M=20$ clusters, Algorithm 2;
$\mathtt{tinf}\approx 5.04$. (
d) Spectral Clustering into
$M=20$ clusters, see [
3];
$\mathtt{tinf}\approx 7.12$.
DPHclustering (Algorithm 1) is guaranteed to return an
Mpart clustering of the base stations (if
$M\le b$). Consequently, the output of
DPSimilarity Clustering (Algorithm 2) is always an IFcluster system (Definition 1) because it assigns each user to the cluster with the largest weight to the user; consequently, every user is served by at least one base station. However, this is not the case for
Spectral Clustering, which, in certain scenarios, is very likely to create a cluster with some users but zero base stations; see
Figure 2a and
Figure 3a.
4.1. Analysis of the Running Times
If the order of magnitude of
M is reasonable, i.e.,
$bM=\mathsf{\Omega}\left(b\right)$, then the running times of neither
DPSimilarity Clustering nor
Spectral Clustering depend too much on the exact value of
M; in fact, the difference in running time between
$M=20$ clusters and
$M=40$ clusters with
$b=200$ BSs and
$20\le u\le 500$ users is about
$5\%$. Thus, the running times on
Figure 2d and
Figure 3d are measured for reasonable values of
M.
Recall Lemma 2. Theoretically, the running time of DPSimilarity Clustering is dominated by the matrix multiplication $W\xb7{W}^{T}$ (computing the similarity measure), but this operation takes less than $0.5\phantom{\rule{0.166667em}{0ex}}\mathrm{ms}$ even for a $200\times 500$ matrix (thanks to the accelerated vector operations in the x8664 instruction set). In this regime of $b\le 200$, most of the running time of DPHclustering (Algorithm 1) is spent after the matrix multiplication, which is logquadratic in the number of base stations, but the constant factor of the main term is probably quite large. Thus, to improve the efficiency of the DPSimilarity Clustering algorithm, further development should be focused on DPHclustering.
The running time of Spectral Clustering is a combination of computing the eigenvectors of a matrix of order $b+u$ and subsequently clustering the normalized eigenvectors via the kmeans algorithm. Considering the large base matrix and the complex operations performed on it, it is not surprising that Spectral Clustering is an order of magnitude slower than DPSimilarity Clustering.
Let us refer back to our considerations after Problem 1. In realworld applications, users are constantly entering and leaving the network, and some of them may move out of the range of their cells. For this reason, the clustering needs to be frequently updated, but the time complexity may be prohibitive if the number of BSs is very large. However, if the overall changes to W are not large, it is possible to reuse ${\mathcal{I}}_{\leftI\rightM}$, and thus DPHclustering need not be called upon every time G or W is updated. Practically, DPHclustering is called upon only if the $\mathrm{tinf}$ value of the solution increases beyond a preset threshold. Since users are joined to the respective best cluster by Algorithm 2, if the users know the current ${\mathcal{I}}_{\leftI\rightM}$, they can decide for themselves individually (locally) when to leave and join another cell.
4.2. Observations (Scenario 1)
Let us start with comparing the two algorithms (
DPSimilarity Clustering and
Spectral Clustering) on an arbitrarily chosen clustering problem in the setting of Scenario 1; after that, we turn to a quantitative comparison.
Figure 1a shows a randomly and uniformly generated placement of 100 BSs and 200 users. The scaling of the
$\mathtt{tinf}$ (see Equation (
2)) of the solutions provided by
Spectral Clustering and
Spectral Clustering algorithm as a function of the number of clusters
M is shown in
Figure 1b.
Figure 2 and
Figure 3 compare the performance of the algorithms for
$b=50$ and 200 BSs. The plots correspond to the mean
$\mathtt{tinf}$ values of the solutions provided by the algorithms over 100 random samples of BSs–user placements.
4.3. Observations (Scenario 2)
To cover a wider spectrum of parameters, for Scenario 2, we chose a nonuniform distribution for the base stations and set
$\alpha =3$ to model a rural environment (see ([
1], Section 2)). The measured running times were not plotted because they mostly depend on the size of the matrix
W (and very weakly on
M). In fact, it turns out that the overall picture in Scenario 2 is very similar to Scenario 1.
Figure 4a and
Figure 5a show that as the number of clusters grow, the higher the probability that spectral clustering generates an infeasible solution (containing a cluster without a base station). Even when a feasible solution is found by spectral clustering, most of them have a higher
$\mathtt{tinf}$ value than the solution produced by Algorithm 2; see
Figure 4b and
Figure 5b. An example problem generated in the setting of Scenario 2 is visualized in
Figure 6, along with the solutions provided by similarity clustering and spectral clustering.
5. Conclusions
We proposed a robust and deterministic algorithm to solve the total interference minimization problem. We demonstrated through analysis and simulation that it provides higher quality solutions than the popular Spectral Clustering method (quality as measured by the $\mathtt{tinf}$ value of the solution). The algorithm runs quickly enough to be considered in realworld applications.
The results of the different simulations plotted on the figures in the previous section demonstrate that our heuristic DPSimilarity Clustering algorithm provides highquality solutions with low time complexity for the total interference minimization problem in bipartite graphs. Applying it for typical wireless networks, it nicely optimizes the total interference in the overall communication network. Compared with the stateoftheart spectral clustering method, it is clear that the proposed algorithm achieves better performance with much less (computational) complexity.
Next, we list two suggestions for future research problems, whose solutions can considerably increase the usefulness of our algorithm in the wireless network domain.
The first problem is this: suppose that the clustering problem is restricted so that any cluster may contain at most T BSs; assume also that we want to have (at most) M clusters in the solution. However, prescribing both the upper bounds M and T may prevent the existence of feasible solutions, for example, if the number of base stations is more than $M\times T$. It is easy to construct examples where the greedy DPHclustering algorithm (Algorithm 2) is not able to satisfy the two conditions simultaneously, even though many feasible solutions exist. It is very probable that some form of backtracking capability could help tremendously, but we have not tried to address this problem yet.
A more particular problem can be described as follows: our proposed algorithm assigns every BS to some cluster in the total interference minimization problem. However, it is possible that using a certain BS in any cluster causes more interference than not using it at all. This problem can be dealt with a trivial postprocessing procedure: after the clustering $\mathcal{P}=\{{P}_{1},\dots ,{P}_{M}\}$ is determined, delete base station i from ${P}_{k}$ if the removal decreases $\overline{w}\left({P}_{k}\right)/w\left({P}_{k}\right)$, since removing a BS from a cluster cannot increase the interference fractions of other clusters.