Design and Implementation of a New Local Alignment Algorithm for Multilayer Networks

Network alignment (NA) is a popular research field that aims to develop algorithms for comparing networks. Applications of network alignment span many fields, from biology to social network analysis. NA comes in two forms: global network alignment (GNA), which aims to find a global similarity, and LNA, which aims to find local regions of similarity. Recently, there has been an increasing interest in introducing complex network models such as multilayer networks. Multilayer networks are common in many application scenarios, such as modelling of relations among people in a social network or representing the interplay of different molecules in a cell or different cells in the brain. Consequently, the need to introduce algorithms for the comparison of such multilayer networks, i.e., local network alignment, arises. Existing algorithms for LNA do not perform well on multilayer networks since they cannot consider inter-layer edges. Thus, we propose local alignment of multilayer networks (MultiLoAl), a novel algorithm for the local alignment of multilayer networks. We define the local alignment of multilayer networks and propose a heuristic for solving it. We present an extensive assessment indicating the strength of the algorithm. Furthermore, we implemented a synthetic multilayer network generator to build the data for the algorithm’s evaluation.


Introduction
Network theory is one of the most important frameworks for a meaningful description and an efficient analysis of many complex systems [1][2][3]. Most popular analysis algorithms comprise the mining of a single network, e.g., using community detection algorithms [4]. In parallel, the comparison of networks has led to the introduction of many algorithms for comparing the structures on both a global and a local scale [5,6] that fall into the class of network alignment (NA) algorithms. The first class of algorithms, also known as global network alignment (GNA) algorithms, aims to find the overall similarity among networks. Differently, algorithms belonging to the second group are called local network alignment (LNA) algorithms and aim to find (relatively) small regions of similarity. The output of LNA algorithms is a set of matched regions (or subgraphs) among two graphs given as the input.
More recently, in many application fields, e.g., mobile and social networks and connectomics and metabolomics studies, the need for introducing models more complex than traditional networks arises [7,8]. In such contexts, nodes may have different classes of interactions among them, and such interactions may also be time-varying. In particular, networks representing multiple different associations among patients can be represented by a multilayer graph comprised of multiple interdependent graphs, where each graph represents an aspect or a set of similar interactions [9,10]. Figure 1 represents a simple multilayer graph with three layers. Each layer is a different graph G. Edges of a multilayer graph can be intra-layer, i.e., connecting nodes of the same layer, and inter-layer, i.e., connecting nodes of two different layers [11,12]. Example of a multilayer network. The figure represents a simple multilayer graph with three layers. Each layer is a different graph. Edges of a multilayer graph can be intra-layer, i.e., connecting nodes of the same layer, and inter-layer, i.e., connecting nodes of two different layers.
Formally, a multilayer network graph may be described as a tuple G ml =V L , E intra L , E inter L xL , where L = {0, 1, . . . , l} is a set of layers and E inter L xL is a set of edges among layers. For each layer k, we have a graph V k , E intra k (intra-layer edges), and for each pair of layers, k, h we have a set of edges Einter v xk, which is a set of layers connecting nodes of the layers v and k [13].
Examples of multilayer networks come from many different fields, from social network analysis to biological networks. For instance, Figure 2 represents an example of a biological multilayer network representing the interplay among diseases, genes, and drugs. While many efforts have been made to address challenges related to the analysis of a single network, i.e., community detection in multilayer graphs, there is a need for the formalization and introduction of algorithms to compare multilayer networks. A simple strategy is an adaption or the simple use of existing algorithms for LNA. Unfortunately, this strategy is unsuitable, as previously demonstrated also in heterogeneous networks [14] because the the current algorithms are not able to manage the difference among layers.
Thus, we propose local alignment of multilayer networks (MultiLoAl), a novel algorithm for the local alignment of multilayer networks. We define the local alignment of multilayer networks and propose a heuristic for solving it. MultiLoAl is based on an extension of the previous L-HetNetAligner [14], so it is based on the following steps, as depicted in Figure 3. Our algorithm receives two multilayer networks and a set of similarity relationships among nodes of the same layer in both networks used as the seed to build the alignment. For instance, considering biological networks, similarity relations are represented by orthologs. The user may find these relations in databases of orthologs (e.g., OrthoMCL, etc.). It produces a set of multilayer graphs representing single local regions of the alignment.
The algorithm merges two input multilayer graphs into a single one, named the multilayer alignment graph, a multilayer graph with the same number of layers of the two inputs, and each layer represents an alignment graph of the same layer of the two input ones. For each node of a layer k, the alignment graph features pairs of nodes of the input ones. After building each alignment graph for each layer separately, we analyse the two input graphs to add inter-layer edges of the multilayer alignment graph. Finally, the algorithm uses a community detection algorithm suitable for multilayer graphs to detect communities representing local regions of similarity, i.e., a single region of the alignment. The result of our algorithm is a list of mappings among a subset of nodes of two networks, i.e., a set of mapped regions among input graphs.
We also realized a preliminary implementation of our algorithm by using the R programming language. We here refined such an implementation even in a high performance computing (HPC) fashion and provided deeper experimentation on a larger dataset. The main contributions of this paper are: (i) the implementation of a novel algorithm for the local alignment of multilayer networks, (ii) the definition of the local alignment of multilayer networks, (iii) the solution of a heuristic for solving it, and (iv) the implementation of a synthetic multilayer network generator to build the data for the algorithm evaluation.
The rest of this paper is organized as follows. Section 2 discusses the background on multilayer networks and multilayer community detection. Section 3 presents the MultiLoAl algorithm. Section 4 presents and discusses the results. Finally, Section 5 concludes the paper.

Alignment of Multilayer Networks
The alignment of networks aims to compare two or more networks [5], and existing algorithms may be categorised as local or global based on the approach, despite the existence of other classifications, i.e., algorithms for specific networks such as heterogeneous or temporal networks. Local network alignment (LNA) algorithms aim to find some similar (relatively) small subnetworks, while global network alignment (GNAs) algorithms search for the best superimposition of the whole compared networks. The literature contains many algorithms for other classes of networks (see, for instance, [5]). Unfortunately, the existing algorithms do not perform very well for multilayer networks [11,15].

Community Detection in Multilayer Networks
Community detection is one of the most popular research areas in various complex systems, such as biology, sociology, medicine, and transportation systems [16][17][18]. The reason is that the community structures, defined as groups of nodes that are more densely connected than the rest of the network, represent significant characteristics for understanding the functionalities and organisations of complex systems modelled as networks [16]. It is expected that the communities play significant roles in the structure-function relationship. For example, in biological networks such as protein-protein interaction (PPI) networks, the communities represent proteins involved in a similar function; in neuroscience, the communities detected in brain networks mean regions of interest (ROI) that are active during tasks; in social networks, communities can be groups of friends or colleagues; in the World Wild Web, the communities represent the web pages sharing the same topic [19]. Thus, the discovery of communities in these systems has become an interesting approach to figuring out how network structure relates to system behaviours.
Discovering a community structure in multilayer networks has became a hot research topic due to the inability of classical community detection methods to deal with the complexity of the multilayer model.
In fact, in multilayer networks, the communities represent groups of well-connected nodes in all layers. Thus, the detection algorithms should take into account the differences among layers. Unfortunately, traditional community detection methods are not able to deal with the complexity of the multilayer networks because (i) they do not enable analysing subsets of the layers and also (ii) they do not depict the diverse layers, and thus, they cannot distinguish between different types of multilayer communities [20]. To overcome these limitations, different community detection algorithms for multilayer networks have been recently proposed. For example, Infomap [21], a multilayer generalization of [22], is a method based on random walks. It considers that an entity randomly following the edges of a network would tend to become captured within communities due to the greater density of edges between nodes within the same community, moving less frequently from one community to another. This algorithm tries to identify a partition of vertices and levels that minimises the equation of the generalised map, which measures the length of the description of a random walk on the partition.
GenLouvain [23] is a multilayer generalisation of the iterative GenLouvain algorithm. This algorithm seeks a partition of the nodes and layers that maximises the multilayer modularity of the network. It searches the global information of the network, finding which are the edges of the network that contribute to the creation of the community structure; then, it applies a novel measure of edge centrality, to classify all the edges of the network concerning their proclivity to propagate information through the network itself.
ABACUS [24] is an algorithm that ensures the mining of multidimensional communities based on the extraction of frequent closed itemsets from monodimensional community memberships. At first, ABACUS considers each dimension independently, and it mines monodimensional communities. After that, it labels each node with a list of pair tags, i.e., the dimension and community the node belongs to in that dimension. Then, ABACUS considers each pair of tags as an item, and it applies a frequent-closed-itemset-mining algorithm. Finally, the multidimensional communities described by the itemsets consist of frequent closed itemsets.
Multilayer clique percolation [25] is a method that extends the popular clique percolation method for simple networks, where dense regions correspond to cliques and adjacency consists of having common nodes. The algorithm extends this step by searching cliques by encompassing multiple layers and reformulating adjacency so that both common nodes and common layers are expected. Multilayer clique percolation communities are combinations of adjacent cliques, so all the edges in these cliques can be considered part of the community.
Multidimensional label propagation (mdlp) [26] is an algorithm based on label propagation. At first, the algorithm assigns a different label to each node, and then, it weights the contribution of each neighbour based on their similarity with the nodes on the different layers. In particular, if two nodes are adjacent on all layers and have the same neighbours, they would have a higher probability of sharing the same label. Finally, the algorithm gives a score for each pair of nodes, referring to how likely a label should be extended from one to the other, bringing a common community.

MultiLoAl Algorithm
Initially, the algorithm inputs two multilayer networks and a set of similarities among node pairs of the same layer into the input networks. Then, it builds the alignment by performing two main steps: (i) construction of the multilayer alignment graph and (ii) mining of the multilayer alignment graph.
MultiLoAl analyses separately each corresponding pair of the corresponding layers of the input graphs. Each pair of a network of the same layer builds an alignment graph, as previously shown in L-HetNetAligner [14]. Then, it analyses the inter-layer edges of the input networks to add inter-layer edges to the multilayer alignment graph. Once the alignment graph is built, we use an algorithm for detecting communities in multilayer networks to uncover relevant modules. Figure 4 shows these steps. Step (i) may be subdivided into two substeps: (i.a) adding nodes and intra-layer edges; (i.b) adding inter-layer edges.
Let us consider two multilayer input graphs G 1 and G 2 . Node colours are used to distinguish different types of nodes belonging to two different types of layers. For simplicity, two multilayer input networks have the same number of nodes.

3.1.
Step (1.a): Adding Nodes and Intra-Layer Edges to the Alignment Graph In the first step, the algorithm considers each pair of corresponding layers separately see Figure 5. For each layer, it builds an alignment graph following the approach proposed in L-HetNetAligner [14], adapted to the case of one-colour networks, as reported in Figure 6.  At this stage, the algorithm, starting from an initial list of seed nodes, builds the alignment graph by initially constructing two intermediate alignment graphs, which we call alignment graph layer 1 and alignment graph layer 2, for two networks belonging to layer 1 and two networks belonging to layer 2. Therefore, we define the alignment graph Gal = (V al , E al ) as a graph constructed by two initial input graphs The selection of node pairs is guided by the input similarity relationships. Therefore, each node is matched with the most similar node of the other network through the use of the input similarity relationships, i.e., seed nodes; each node of the alignment graph represents a pair of similarities among nodes from the input networks; see Figure 7.
Once all nodes have been added to the graph, the algorithm builds the edges of the alignment graphs. For each pair of nodes, the algorithm examines the two input graphs, and it inserts and weights the edges considering three conditions: match, mismatch, and gap. Let us consider the nodes of the alignment graphs; in particular, let us consider the pair of nodes (G1 − G1) and (G2 − G2) in Figure 6. To determine the presence of an edge, we consider the edge (G1, G2) ∈ G 1 network and (G1, G2) ∈ G 2 network. If G 1 and G 2 contain these nodes and the nodes are adjacent, there is a match, which we call, for convenience, a homogeneous match, since the nodes of the two networks are of the same type (see Figure 8a).  Let us consider ∆ = 2 as the node distance, i.e., the length of the shortest connecting path threshold to discriminate between gaps and mismatches. If G 1 and G 2 contain these nodes and the nodes are adjacent only in a single network, there is a mismatch, which we call a homogeneous mismatch (Figure 8b).
If G 1 and G 2 contain these nodes, the nodes are adjacent only in a single network, and they are at a distance less than ∆ (gap threshold) in the other network, there is a gap, which we call a homogeneous gap (Figure 8c). After the edges of the alignment, graphs are added, and a weight is assigned to each edge by applying an ad hoc scoring function F and the gap threshold ∆. The function assigns a high score to the matches than to the mismatches and gaps. The kind of scoring function has a large significance for the resulting alignment graph and on the alignment itself. The algorithm enables the user to choose other values to optimize the quality of the results. In this work, we set the weight assigned to each edge as follows: homogeneous match equal to 1, homogeneous mismatch equal to 0.5, homogeneous gap equal to 0.2.

Step (1.b): Adding Inter-Layer Edges
The algorithm adds the inter-layer edges among multilayer alignment graph layer 1 and alignment graph layer 2. For each pair of nodes in the multilayer alignment graphs, the algorithm examines the corresponding layers of the input graphs. Let us consider the pair of nodes (G1) and (D4) in Figure 8. To determine the presence of an edge, we consider the edge (G1, D4) ∈ G 1 network and (G1, D4) ∈ G 2 network. The initial graph contains both edges connecting their internal nodes, and if the nodes are adjacent, there is a match, which we call, for convenience, a heterogeneous match, since the nodes of the two networks are of different types; see Figure 9a. Let us consider the pair of nodes (G5) and (D2) in Figure 8b. To determine the presence of an edge, we consider the edge (G5, D2) ∈ G 1 network and (G5, D2) ∈ G 2 network. G 1 contains the edge (G5, D2), while nodes G5 and D2 are disconnected in G 2 If the initial graph contains both edges connecting their internal nodes and the nodes are adjacent, there is a match, which we call, for convenience, a heterogeneous match, since the nodes of the two networks are of different types; see Figure 9a. Therefore, there is a heterogeneous mismatch (Figure 9b). Then, we set the weight assigned to each edge as follows: heterogeneous match equal to 0.9, heterogeneous mismatch equal to 0.4.

Step 2: Detection of Communities on the Alignment Graph
Finally, the final alignment graph is then mined to discover communities by applying a community detection algorithm by using existing algorithms for multilayer networks [27][28][29][30], see Figure 10. Our methodology presents a general design, so it is possible to mine the final alignment graph by applying a different mining method. In the current version of MultiLoAl, we applied the Infomap algorithm to mine the communities on the alignment graph. However, the user can choose which community detection algorithm to select among Generalized Louvain, ABACUS, clique percolation, and mdlp. The output consists of a file that contains the extracted communities as a list of nodes, the weight of the edge, and the string in which it is reported if there is a homogeneous/heterogeneous match, homogeneous/mismatch, or homogeneous gap (see an example of the output at https://github.com/mmilano87/MultiLoAl (accessed on 12 August 2022)).

MultiLoAl vs. L-HetNetAligner
MultiLoAl, despite being based on the previous L-HetNetAligner, presents many different characteristics. First, the algorithms have different scopes: MultiLoAl is a local aligner of multilayer networks, while L-HetNetAligner works only on heterogeneous networks. In detail, by analysing the building of local alignment, MultiLoAl and L-HetNetAligner have two main general steps: (i) construction of the alignment graph; (ii) mining of the alignment graph. The building of the alignment graph is the first main difference among the two algorithms. In fact, MultiLoAl builds a multilayer alignment graph through two substeps: (i) by adding nodes and intra-layer edges, following the approach proposed in L-HetNetAligner adapted to the case of one-colour networks; (ii) by adding inter-layer edges. This last step represents the main novelty compared to L-HetNetAligner, because MultiLoAl analyses and adds the edges among different layers of input networks. Otherwise, L-HetNetAligner builds a heterogeneous alignment graph. Initially, L-HetNetAligner defines the nodes of the alignment graph as composite nodes representing pairs of nodes matched by the similarity considerations. The algorithm inserts and weights the edges in the alignment graph to the nodes for which the edge links have the same colour and according to their distance in the input network. Finally, once the alignment graph is built, both algorithms mine the alignment graph to discover modules that represent local alignment. MultiLoAl applies a community detection algorithm, Infomap, to mine the final alignment. The result consists of the extracted communities as a list of nodes, the weight of the edge, and the string in which it is reported if there is a homogeneous/heterogeneous match, homogeneous/mismatch, or homogeneous gap. Otherwise, L-HetNetAligner uses the Markov clustering (MCL) algorithm to cluster the graph. Each extracted module represents a single region of the alignment. The result of our algorithm is a list of mappings among a subset of nodes of two networks, i.e., a set of mapped regions among input graphs.

Evaluation of the Quality of the Alignment
The evaluation of the quality of the alignment of network is still a matter of debate for simple networks [5,31,32]. There exist many measures able to evaluate both the correctness of the alignment, as well as the quality of the obtained alignment [33]. On the other side, there is no gold standard to benchmark the alignment. Moreover, all the existing measures need to be extended in the multilayer case. Thus, we first introduce novel measures of correctness in the multilayer case (to the best of our knowledge, there are not any other available measures), then we perform an assessment of our methods. We first designed a proof of concept to show the ability of our algorithm to map correct nodes and edges by aligning a synthetic network with itself and with some randomised versions.
The correctness of an alignment is usually evaluated by means of the analysis of its topological quality, i.e., the ability to reconstruct the underlying true node mapping well (when such a mapping is known) and if it conserves many edges. For simple networks, F − N C (F-score node correctness) is a measure of node correctness, and it is a combination of two measures: P − N C and R − N C. P − N C is calculated as M∩N M , and R − N C is defined as M∩N N , where M is the set of node pairs that are mapped under the true node mapping and N is the set of node pairs that are aligned under an alignment f .
We here extended such a measure in the multilayer case. We first considered in a separate way each layer, and we calculated the F − N C for each layer . Then, we computed the multilayer F − N C as the average of all F − N C .
Similarly, the edge correctness in the simple case can be measured by considering NCV-GS 3 , which is a combination of two measures: high node coverage (NCV) and generalized S 3 (GS 3 ). NCV is the percentage of nodes from G 1 and G 2 that are also in G 1 and G 2 , and GS 3 measures how well edges are conserved between G 1 and G 2 , where G 1 and G 2 are two graphs and G 1 and G 2 are subgraphs of G 1 and G 2 that are induced by the mapping.
We used NCV-GS 3 to measure the edge correctness of each layer , then we averaged the measures of such values for all the layers, and we obtained the multilayer NCV-GS 3 .
Finally, we should consider the edge correctness for the inter-layer edges. Without loss of information, we considered all the inter-layer edges as a whole, and we calculated the correctness of all the inter-layer edges as N CV − GS \ ∇.

Proof of Concept
As a proof of principle, we present the use of the MultiLoAl dataset consisting of ten multilayer synthetic networks that we built with the graph generator, implemented ad hoc in the R code. An example of the multilayer network and R function are available on the web site of the project (https://github.com/mmilano87/MultiLoAl (accessed on 12 August 2022)).
All the multilayer networks have 30 nodes and 2 layers, whereas the edges are distributed as depicted in Table 1. First, we aligned each network with respect to itself to show the ability to find known regions of similarity; second, we considered the alignment of the network with respect to an altered version of the network obtained by adding different levels of noise (5%, 10%, 15%, 20%, and 25%) by randomly removing edges from the network. The aim of the test was to demonstrate that the alignment algorithms are capable of producing high-quality alignments with an edge conservation of about 90%. Then, we implemented different versions of the MultiLoAl algorithm by varying the strategy applied to mine the community on the alignment graph. We executed the experiments on an Intel Core i5 Processor, 2.9 Ghz, with 4 Gbytes of main memory running the Ubuntu OS ver 18.04. MultiLoAl built 60 alignments, and it completed the whole process of alignment in ten seconds.
To measure the performance of the alignments built with different versions of Mul-tiLoAl, we evaluated the quality of the results by considering the topological aspects of alignments and the number of communities found. At first, the results were evaluated by the topological quality.
We computed the NCV-GS 3 and F-NC measures for all alignment networks by considering the intra-layer and inter-layer edges. Tables 2-5 report the results. Tables 6-9 report the mean and standard deviation values of the NCV-GS 3 and F-NC measures for each synthetic network aligned with its noisy counterpart.        The results show that the quality of the alignment was greater when Infomap was applied to mine the community. Furthermore, increasing the noise level from 5% to 25% in the original networks caused NCV-GS 3 and F-NC to decrease.

Conclusions
Recently, the applications of multilayer networks in social network analysis, in finance, and in biology have been increasing. Multilayer networks can be seen as a set of networks (each network is a distinct layer) connected by inter-layer links. We here focused on the problem of comparing two multilayer networks, highlighting small local regions of similarity. Since existing algorithms for simple networks do not perform well on multilayer networks, we proposed Local Alignment of Multilayer Networks (MultiLoAl), a novel algorithm for the local alignment of multilayer networks. We proposed a heuristic for solving it. Furthermore, we performed an extensive evaluation to reveal the strength of the algorithm. Since we presented the use of MultiLoAl on multilayer synthetic networks, we plan to extend the application to real biological networks.