The Whole Is Greater than the Sum of the Parts: A Multilayer Approach on Criminal Networks †

: Traditional social network analysis can be generalized to model some networked systems by multilayer structures where the individual nodes develop relationships in multiple layers. A multilayer network is called multiplex if each layer shares at least one node with some other layer. In this paper, we built a unique criminal multiplex network from the pre-trial detention order by the Preliminary Investigation Judge of the Court of Messina (Sicily) issued at the end of the Montagna anti-maﬁa operation in 2007. Montagna focused on two families who inﬁltrated several economic activities through a cartel of entrepreneurs close to the Sicilian Maﬁa. Our network possesses three layers which share 20 nodes. The ﬁrst captures meetings between suspected criminals, the second records phone calls and the third detects crimes committed by pairs of individuals. We used measures from multilayer network analysis to characterize the actors in the network based on their local edges and their relevance to each speciﬁc layer. Then, we used measures of layer similarity to study the relationships between different layers. By studying the actor connectivity and the layer correlation, we demonstrated that a complete picture of the structure and the activities of a criminal organization can be obtained only considering the three layers as a whole multilayer network and not as single-layer networks. Speciﬁcally, we showed the usefulness of the multilayer approach by bringing out the importance of actors that does not emerge by studying the three layers separately.


Contextualization
A long time before the development of social network analysis (SNA), people dealt with multiple social networks. Even if this has always been done with no effort, it is not a trivial activity that can be overlooked. One of the possible views of the problem is represented by the connections between individuals through multiple types of relational ties [1][2][3][4].
The world cannot be defined considering different kinds of relationships as ontologically equivalent and without taking into consideration the interactions among different kinds of connections from which invisible relationships emerge. For this reason, we cannot look only at a single kind of relationship within a single social network [1][2][3][4].
A single-layer perspective has been used to study the social interactions in most cases. These interactions have been measured for a long time through the simple graph, which is one of the most powerful tools from SNA. A simple graph is defined as a set of nodes with edges between them. There are no edges that connect a node to itself [2]. Nodes are often called actors, which usually represent individuals or organizations. Edges are also known as connections or links, and they usually represent relationships between individuals such as friendships. The type of graph is defined by edge weight and directionality. If there exists a numerical value (i.e., a weight) attached to each edge, then the graph is weighted. If all edges are bidirectional, then the graph is called undirected. If edges have directionality, then the graph is called directed.
In practical contexts, the most applied set of SNA tools is probably represented by the family of centrality measures. Centrality is an intrinsically relational concept. In fact, an actor needs to have relations to be important [2]. An actor is important if it is connected to other important nodes or to a large number of different nodes. An actor can also be central if his absence breaks the network into many isolated components. The position of the actor within a network can give him power in terms of the control over the information flowing through the network. The number of interactions of each actor is quantified by the most traditional centrality measure, which is the degree.
Social networks are usually treated as monodimensional objects, but they can at least have three different dimensions: (i) a structural dimension, (ii) a compositional dimension and (iii) an affiliation dimension. The full complexity of a social structure can be understood only through these dimensions [1]. The first one corresponds to the simple graph made by actors and their relationships. The second one describes the actors and their personal information. The third one indicates the belonging of these actors to the same organization or family.
Multiple relationships can also be considered according to an alternative conceptual approach as a set of layers (i.e., connected levels) [2][3][4]. Each simple graph with nodes and edges can be organized into multiple layers. Each layer can represent different types of edges, nodes, online social networks, communities, social contexts, and so on. In such a multilayer network, nodes in the different layers refer to a global set of actors, and edges can also connect nodes in the same layer (i.e., intralayer edges) or in different layers (i.e., interlayer edges). Analyzing multiple layers, we can obtain information that is not present in each single layer and would not be achieved considering layers independently of each other.
Many relevant works on multilayer networks have been reviewed and discussed by Kivelä et al. [5], who also presented a general framework for these kinds of networks [6][7][8]. This framework introduced the cumulative constraints on the general model [9], explaining the different types of networks: monoplex or single layer [6], multiplex [10][11][12], interdependent, and networks of networks.
Dickison, Magnani and Rossi [2] realized a book on the evolution of interconnected social networks, dynamic processes, data collection, data analysis, modeling and mining of multilayer social network systems. According to the authors, we can find multilayer social networks in different contexts. They are generated from data of different sizes, natures and layer semantics. More specifically, during SNA studies, many multilayer networks with actors connected by multiple types of edges have been created from offline questionnaires or interviews. They are often networks of small sizes, and they can be very useful in testing new methods. The only multilayer criminal network is the one described by Bright et al. [13]. This network has eight layers with a specific type of edge related to the exchange of drugs, money and other particular resources. The authors explored the actor's strategic positions across the eight layers. They recognized the importance of multiplex data in criminal network analysis, but their results were limited to an aggregate network, which was obtained by summing up all interactions across the whole network while neglecting the layered structure.
In 1991, Sparrow [30] began to explore and introduce the relevance of some concepts from the social network analysis such as centrality, equivalence and weak ties for the analysis of criminal intelligence. The notion of centrality refers to the use of standard centrality metrics from SNA to identify the key actors for a criminal organization. The concept of equivalence refers to the presence in the criminal network of individuals with similar roles or with the same neighborhood. Weak ties are connections among actors who have no other direct connections. Sparrow also identified the three main problems of criminal network analysis: (i) incompleteness (i.e., missing nodes and edges); (ii) fuzzy boundaries (i.e., which individuals to include and which not to include in the network); (iii) dynamic (i.e., criminal networks are not static, but they change over time).
Klerks [31] described a crisis of LEA confidence in the mid-1990s which led to a collaboration between Dutch law enforcement and academics to develop elaborate network approaches to organized crime. Thanks to this collaboration, it was possible to identify powerful positions inside criminal networks and to attribute these positions to specific individual traits or to structural roles covered by these individuals. For example, the centrality of intermediaries was recognized because these figures could monopolize the connection inside criminal networks. Moreover, according to the author, valuable resources and information for criminals could be discovered thanks to social network mapping.
Xu and Chen [17] studied the topology of criminal networks, which are characterized by a high clustering coefficient, a short average path length and a high level of efficiency in terms of communication, information flow and commands. These networks are moreover more vulnerable to attacks when the targets are the bridges connecting different communities within them rather than their high-degree nodes.
Van der Hulst [14] offered a general introduction to SNA as an analytical tool for the study of criminal networks, whose systematic analysis could lead to a more comprehensive understanding of criminal behavior. Some theoretical and key concepts were reviewed by the author together with functional applications, and a tentative protocol for coding and data handling.
Morselli [16] wrote an excellent book applying a social network perspective to a variety of illegal enterprises, focusing on the flexibility of these organizations and their strategies to adjust after losing key members or opportunities. The author examined the structures and dynamics of criminal networks, their key and peripheral players, their balance between efficiency and security, positions and individual traits, the use of legitimate actors in illegal settings, and finally the network adaptation against disruption.
Calderoni [32] discussed the state of the art in the study of organized criminal groups through the application of SNA methods. He used both the academic and law enforcement perspectives to provide an overview of the field development and to describe the existing approaches. The author focused on data sources, the type of network analysis, and the limitations due to the application of SNA to criminal organizations. He also identified the future trends and suggested some promising paths from both a policy and research perspective.
Berlusconi [33] also highlighted the relevance of SNA to analyze criminal groups, for both research and LEA purposes. She discussed how SNA can be applied in various areas related to the criminological context, and how this application could have great value for crime prevention.
Burcher [34] explained how intelligence analysts apply SNA in operational environments focusing on the identification of key actors, network vulnerabilities, avenues of enquiry, link and attribute weights, and when during an investigation, analysts apply SNA.
Bright et al. [35] discussed the growth in popularity of the use of SNA to study crime and reviewed the challenges related to the use of data extracted from criminal justice records. They offered to researchers some recommendations about data collection and preparation when utilizing these kinds of records. The use of criminal justice records can suffer from a number of limitations in terms of accuracy, validity and reliability. Such data may suffer from the problem of missing data. They may include transcription errors, aliases or false information. The authors also outlined and discussed the different types of data used across this literature.

Past Approaches
In [36], we proposed a unique multilayer mafia network. The Sicilian Mafia [37,38] is one of the most renowned criminal organizations. Each mafia group is called a cosca, family or clan. The analysis of the mafia social structure inspired great interest in the scientific community [39]. Our multilayer network was built from two simple graphs that captured the meetings and phone calls between a couple of suspected criminals identified by police audio and physical surveillance during an anti-mafia operation known as Montagna [25,29,40]. The Public Prosecutor's Office of Messina (Sicily) concluded this investigation in 2007. Given these two single-layer networks, we build a weighted and undirected multilayer network with 154 actors, 439 intralayer edges, and 2 layers named Meetings and Phone Calls. Then, we focused on the identification of leaders within the two-layer Montagna network. We chose the degree as descriptor to identify the 20 most important actors in the network. We used three different approaches to compute degree: (i) on each layer separately, according to its standard definition on simple graphs; (ii) on the aggregated network obtained by merging the two layers of the multilayer network into a single-layer network, again according to its standard definition; and (iii) on the multilayer network, computed as the number of each actor's relational ties on all the layers in which he exists.
In this paper, which is an extended version of [36], we rebuilt the multilayer network adding a third layer derived by a third new simple graph based on the crimes that suspected criminals from the Montagna operation committed together. The resulting network is a weighted and undirected multilayer network with 226 actors, 454 intralayer edges, and 3 layers named Meetings, Phone Calls and Crimes. Then, we made a deep study of the new three-layer Montagna network using at first some traditional measures from SNA to study the single layers and, then, actor and layer measures from the multilayer network analysis to evaluate the actor importance in each layer and in the whole structure and the dissimilarity among the layers. To the best of our knowledge, we are the first to build and study a criminal network in a form of multilayer network and not as an aggregate network. From our analysis, a different understanding of the network structure and of the key actors emerges, which allows to declare that LEAs should collect more multiplex data in order to reduce their efforts during an investigation. The identification of key actors taking into account more layers could, in fact, better identify the strategic positions of suspected criminals allowing to make decisions on targets of surveillance or arrest.

Multilayer Networks
A multilayer network is the most general structure to represent any kind of network [5]. The elementary concept of a graph, also known as single-layer network, is at the base of its structure. Networks with multiple levels, multiple types of edges or other similar features can be represented, adding some structures with layers to nodes and edges.
Definition 2 (Multilayer network). A multilayer network [2] is defined as a quadruple M = (A, L, V, E), where A is the set of actors, L is the set of layers, (V, E) is a simple graph and V ⊆ A × L.
An actor is the real-world concept represented by a node which is an element of the mathematical concept of the graph. An actor can be a person, an organization or an entity which has relationships with other actors. The same actor can be present in different layers, where each layer represents a type of actor or a type of edge between actors. A node represents a specific actor on a specific layer (e.g., a node can be the Facebook or Twitter account of a specific user which is the actor). Using multiple layers, we can represent different types of edges that correspond to the relationships between nodes. Intralayer edges are those among nodes in the same layer. Interlayer edges are those among nodes in different layers.
When a common set of actors is connected through multiple types of edges, a multilayer network can be reduced to a multiplex network.
Definition 3 (Multiplex network). A multiplex network [5] can be defined as a sequence of is the set of edges and α is the index for the graphs and usually b α=1 V α = ∅ (i.e., the different layers at least share some nodes).

Descriptive Measures
One of the main approaches to study multilayer networks consists in applying typical measures of the traditional SNA to each layer separately and then comparing these results [2]. This kind of approach can be useful for producing an initial overview of the data before applying truly multilayer measures.
Different layers have in fact their specific characteristics in terms of number of nodes or edges, edge directionality, graph density, clustering coefficient, components, or network diameter when observed one at a time.

Definition 4 (Directionality).
Edge directionality is a key property that defines the type of graph together with edge weight (i.e., a numerical value attached to each edge). If all edges are bidirectional, then the graph is called undirected. If edges have directionality, then the graph is called directed.

Definition 5 (Density).
Graph density is a measure of how many edges between nodes exist compared to how many edges between actors are possible. It is defined as: .
(1) Definition 6 (Diameter). Graph diameter d max [41] is the maximum shortest path in the graph, where the shortest path (or distance) d v i v j between nodes v i and v j is the path with the fewest number of edges. A path is a sequence of nodes such that each node is connected to the next node along the path by an edge.
Definition 7 (Average path length). The average path length d [41] is the average distance between all pairs of nodes in the graph. It is defined as: Definition 8 (Connected component). A connected component cc [41] is a subset of nodes in a graph, so that there is a path between any two nodes that belongs to the component, but one cannot add any more nodes to it that would have the same property.
Definition 9 (Largest connected component). The largest or giant connected component lcc can be found typically in real undirected graphs, and it contains most of the nodes in the graph. The rest of the graph usually is divided into a large number of small components disconnected from the others.
Definition 10 (Degree). The degree k [36] is a key property of each node in a graph and represents the number of connections it has to other nodes. It is defined as: Definition 11 (Degree distribution). The degree distribution p k [41] provides the probability that a randomly selected node v i in the graph has degree k. For a graph with N nodes, the degree distribution is a normalized histogram given by: where N k is the number of nodes with degree equal to k.
Definition 12 (Average clustering coefficient). The average clustering coefficient C [41] is the probability that two neighbors of a randomly selected node link to each other. It is defined as: where C(v i ) is the clustering coefficient of a node v i which captures the degree to which the neighbors of v i link to each other. It is defined as: where L i represents the number of edges between the k neighbors of the node v i .

Actor Measures
Actors in multilayer networks can be described by a specific set of metrics [2]. Some of them are direct extensions of their single-layer counterparts and measure actors based on their local edges and relationships. Other measures have no counterpart for single-layer networks. They characterize the relevance of a specific layer or set of layers within the context of the connectivity of the actors.
Definition 13 (Multilayer actor degree). Given a multilayer network M = (A, L, V, E), the degree [2] of an actor a ∈ A on a set of layers L ⊆ L is the number of his connections on all these layers. It is defined as: When L = L, the degree of the actor is within the whole multilayer network, whereas when the set of layers contains only one layer, the traditional degree is computed as shown in Equation (3) for the actor in that layer. Definition 14 (Multilayer actor degree deviation). Given a multilayer network M = (A, L, V, E), the degree deviation [2] of an actor a ∈ A on a set of layers L ⊆ L is defined as the standard deviation of the degree of a over the layers in L: The degree deviation indicates the presence of an actor in the multilayer network and quantifies to what extent actors have similar or different degrees on the different layers.
Definition 15 (Multilayer actor neighborhood). Given a multilayer network M = (A, L, V, E), the neighborhood [2] of an actor a ∈ A on a set of layers L ⊆ L is the number of the neighbors ns of a, i.e., those distinct actors that are connected to a on a specific layer or set of layers. It is defined as: When computed on a single layer network, degree and neighborhood coincide. However, the degree of an actor cannot be defined anymore as the number of adjacent actors in a multilayer network where actors can be connected to different individuals depending on the layer. Definition 16 (Multilayer actor exclusive neighborhood). Given a multilayer network M = (A, L, V, E), the exclusive neighborhood [2] of an actor a ∈ A on a set of layers L ⊆ L counts the neighbors that are adjacent to a only on the input layers L. It is defined as: where the symbol \ indicates the set difference operation.
The exclusive neighborhood considers the actors that are connected exclusively on a specific layer or set of layers, and it is used to explore the role of specific layers for specific actors. If an actor has a high exclusive neighborhood on a layer, this means that this layer is important to maintain the actor connectivity. In fact, if the layer was removed, the actor's neighbors would also disappear.
Definition 17 (Multilayer actor relevance). Given a multilayer network M = (A, L, V, E), the relevance [2] of an actor a ∈ A on a set of layers L ⊆ L is the ratio between the neighbors of a on the specific set of layers L and the total number of his neighbors. It is defined as: r(a, L) = n(a, L) n(a, L) .
The relevance describes the specific signature of each actor, i.e., to be present on the different layers.
Definition 18 (Multilayer actor exclusive relevance). Given a multilayer network M = (A, L, V, E), the exclusive relevance [2] of an actor a ∈ A on a set of layers L ⊆ L is the fraction of neighbors directly connected with a through edges belonging only to layers in L. It is defined as: The exclusive relevance measures what impact the removal of a specific layer or set of layers would have on the connectivity of an actor, also in this case in terms of neighbors.

Layer Measures
Actor measures such as the relevance can be used to know the role of a layer (or a set of layers) with respect to its actors. In order to know the relationship between different layers (or sets of layers), some measures of layer similarity need to be introduced. Layer similarity can be studied from two different perspectives: the actor-centered perspective and layer-centered perspective [2]. The first one describes the differences between different layers as a sign of different behaviors of the actors, strategically selecting what kinds of connections they want to establish on every layer. The second one describes the differences between layers in terms of interlayer influences, and it can be investigated by applying existing methods to compute correlation and similarities to multilayer networks.
Berlingerio et al. [42] developed the idea of layer correlation as a multilayer network version of the classical Jaccard correlation coefficient.
Definition 19 (Jaccard correlation coefficient). Given two finite sample sets A and B, the Jaccard correlation coefficient [43] is defined as the size of the intersection divided by the size of the union of the sample sets: Definition 20 (Jaccard layer correlation coefficient). Given a multilayer network M = (A, L, V, E), the Jaccard layer correlation coefficient [42] computes the ratio of pairs of actors connected on a set of layers L ⊆ L and the total number of pairs of actors connected in at least one layer in L. It is defined as: where P l denotes the set of pairs of actors connected in each layer l ∈ L.
The Jaccard coefficient can be used to measure the overlapping of the actors, i.e., the presence of common actors between pair of layers in a multilayer network.
This coefficient can also be used to measure the overlapping of the edges to know if the common actors between two layers behave in a similar way. This measure can show edges that actually exist in each pair of layers and actors that are highly connected on a layer but not on the other layer. If actors who are connected by an edge in one layer are also connected in the other layer, the value of edge overlapping will be equal to 0. On the contrary, this value will be 1 if the actors are connected in both layers or in none of the layers.
De Domenico et al. [8] introduced an actor-centered approach trying to quantify the similarity between the degree of actors across various layers. In this case, it is possible to use the Pearson correlation coefficient.
Definition 21 (Pearson correlation coefficient). Given two random variables a and b, the Pearson correlation coefficient [44] is defined as: where cov(a, b) is the covariance of a and b and σ a σ b is the product of their standard deviations.

Definition 22 (Pearson interlayer correlation coefficient)
. Given a multilayer network M = (A, L, V, E), the Pearson correlation coefficient [12] is defined as: where k(a, l) and k(a , l ) are the degrees of actor a, respectively, at layer l and layer l , and σ k(a,l) σ k(a ,l ) is the product of their standard deviations.
For each pair (actor, layer), the Pearson correlation does not depend on the connected actors but only on the number of incident edges on an actor.
The difference between the degree distributions in different layers can instead been quantified using the Jeffreys dissimilarity function [45]. [46] is a symmetrized relative entropy (i.e., a measure of the inherent uncertainty or randomness of a single random variable). Given two ensembles A and B, it is defined as:

Definition 23 (Jeffreys dissimilarity function). The Jeffreys dissimilarity function
where P(A x ) and P(B x ) are the probabilities of the propositions {A x } and {B x } indexed by the sample space x.
Definition 24 (Jeffreys layer dissimilarity function). The Jeffreys layer dissimilarity function [47] is used to compare two layers as a distance between discrete distributions (e.g., the degree distribution) based on distance between histograms. It is defined as: where p l k is the relative frequency of the kth degree value in a layer l.
Two layers are more dissimilar when Jeffreys divergence values are higher.

Multilayer Criminal Network Data
As already explained in Section 1. The pre-trial detention for 38 individuals was ordered by the court who wrote a document of more than two hundred pages containing a lot of details about meetings, phone calls, crimes and other activities among the suspects.
Two graphs were initially built from the analysis of this document: Meetings and Phone Calls [25,29,40]. The Meetings network is made of 101 nodes (i.e., suspected criminals) and 256 edges (i.e., meetings among couples of suspected criminals). The Phone Calls network is characterized by 100 nodes (i.e., suspected criminals) and 124 edges (i.e., phone calls among couples of suspected criminals). A total of 47 suspects jointly belongs to both networks.
In the current study, we also built a third graph called Crimes characterized by 25 nodes and 74 edges. It shares with the Meetings and Phone Calls graphs 20 nodes, which are also in this case suspected criminals. Individuals are connected by an edge if they have committed crimes together.
In our previous work [36], we created from the Meetings and Phone Calls graphs a weighted and undirected multilayer network with 154 actors, 439 intralayer edges and 2 layers. In this paper, we rebuilt the multilayer network adding the Crimes layer to the Meetings and Phone Calls layers (see Figure 1). Our new undirected and weighted multiplex network has 226 actors and 454 intralayer edges. Edges in the Meetings layer represent the meetings among suspected criminals; those in the Phone Calls layer refer to the phone communications among distinct phone numbers; those in the Crimes layer refer to common crimes committed by the members of the criminal network. The number of meetings, phone calls or common crimes are encoded by the edge weights. According to Definition 3, our network can be identified as a multiplex network. In fact, a multiplex network requires that each layer share at least one node with some other layer. In our case, Meetings and Phone Calls layers share 47 actors; Meetings and Crimes layers share 25 actors; and Phone Calls and Crimes layers share 20 actors. for each layer l ∈ L do compute N, L, dir., |cc|, lcc, δ, C , d and d max ; end % Actor measures; for each actor a ∈ A do compute k(a, L); end set the 20 high degree actors K A ; for each actor a ∈ K A do compute σ k and n(a, L); for each layer l ∈ L do compute xn(a, l), r(a, l) and xr(a, l); end end % Layer measures; for each pair of layers (l, l ) ∈ L do compute J A (l, l ), J E (l, l ), P r (l, l ) and D J (l, l ); end The analysis of our multiplex criminal network was conducted using the Python 326 module uunet.multinet 1 created by Magnani, Rossi and Vega to analyze multiplex social 327 networks. This library is the Python version of the multinet library for the analysis of 328 multilayer networks released by the same authors for the R framework [48].

Experimental Design
The analysis of our multiplex criminal network was conducted using the Python module uunet.multinet (available at: https://bitbucket.org/uuinfolab/py_multinet/src/ master/, accessed on 10 April 2022) created by Magnani, Rossi and Vega to analyze multiplex social networks. This library is the Python version of the multinet library for the analysis of multilayer networks released by the same authors for the R framework [48].
Algorithm 1 shows the design of our experiments. for each layer l ∈ L do compute N, L, dir., |cc|, lcc, δ, C , d and d max ; end % Actor measures; for each actor a ∈ A do compute k(a, L); end set the 20 high degree actors K A ; for each actor a ∈ K A do compute σ k and n(a, L); for each layer l ∈ L do compute xn(a, l), r(a, l) and xr(a, l); end end % Layer measures; for each pair of layers (l, l ) ∈ L do compute J A (l, l ), J E (l, l ), P r (l, l ) and D J (l, l ); end We started with the creation of the three simple graphs Meetings (G 1 ), Phone Calls (G 2 ) and Crimes (G 3 ) described in Section 2.2, and we added them, respectively, as the first layer (l 1 ), second layer (l 2 ) and third layer (l 3 ) of a multiplex network M.
We applied to each layer l some of the most traditional SNA measures such as the number of nodes, number of edges, directionality, number of connected components, size of the largest connected component, density, clustering coefficient, average path length and diameter (see Section 2.1.1).
Then, we studied the actors in the network (see Section 2.1.2). We computed the highest degree actors on the whole multiplex network focusing on the 20 most central actors. We also compared the degree values of each actor in each specific layer computing the standard deviation of the degree. It can be a useful function to estimate what actors possess similar or different degrees on the various layers.
Given the set of the 20 most central actors K A , we computed the neighborhood for these actors considering the whole network with all the three layers (L) and the exclusive neighborhood on each layer l. The neighborhood is not evaluated on each layer because on a single layer, neighborhood and degree have the same value. The exclusive neighborhood is calculated on a layer because it allows knowing if a layer is important to maintain the actor connectivity.
We also calculated relevance and exclusive relevance for the actors in K A on each layer l. Relevance and exclusive relevance allow studying the relation between the multilayer network and the actors identifying those who are highly connected on a given layer or actors that are connected uniquely through a layer.
At the end, we compared layers using four different approaches (see Section 2.1.3). We computed the overlapping of actors and edges using the Jaccard correlation coefficient. Consequently, we determined the correlation between the degrees using the Pearson correlation coefficient to know if actors with a high degree on one layer had a similar behavior on the other layers. Then, we evaluated the dissimilarity between degree distributions using the Jeffreys dissimilarity function. Table 1 shows our preliminary analysis of the multiplex network considering each layer as an independent graph. The third layer is the smallest one and presents different characteristics especially in terms of number of connected components, size of the largest connected component and graph density. The first two layers seem to be more similar with the exception of the average clustering. The results obtained applying the actor measures on the Montagna multiplex network are shown in Figure 2 and more in detail in Tables 2-4.   Table 2 contains the degree of the 20 most central actors in the whole network and in each single layer. It also shows the standard deviation of the degree. Our actors are specific individuals involved in criminal activities or members of Mafia families which were at the center of the Montagna operation. We reconstructed the actor roles reading court documents of the anti-mafia operation. These roles have been analyzed in our previous work [36] making particular reference to the structure of a Mafia family which is characterized by typical figures such as boss, underboss, consigliere, messaggero, caporegimes, soldiers and associates. Considering the layered structure, the actors we identified as the most central are 18, 47 and 27. These actors are respectively a caporegime of the Mistretta family, a deputy caporegime and a caporegime of the Batanesi family. Therefore, they are effectively important. Caporegimes manage their crew of soldiers (i.e., average type criminals) within a Mafia family in a specific geographical location. These actors have similar degrees on the first two layers and different degrees on the third one. The Crimes layer does not include some of the most central actors such as 68, 12, 22, 11, 43 and 25. Actor 22 is a pharmacist which can be an important figure because chemical or pharmacological knowledge is required during a process of drug synthesis. Actor 11 is a criminal activity coordinator in Messina which is central to know the connection among the Mistretta and Batanesi families with other criminal organizations sited in Messina. Node 43 is the messaggero who is a key figure in a Mafia family who functions as a connection between families. He limits the boss exposure, reducing the necessity to meet publicly. Actor 29 seems to have equal importance for all the three layers as confirmed by a low degree deviation. He is an entrepreneur. Compared with the two-layer multiplex network described in [36], adding the third layer brings to light the importance of the entrepreneurs such as actors 54, 64 and 63, who did not appear as prominent figures from the analysis of the first two layers. Entrepreneurs are important Mafia associates because they can make the criminal organization win public tenders and accomplish the public contracts in a fraudulent way.  Table 3 shows, respectively, the neighborhood of the 20 highest-degree actors of the whole Montagna multiplex network and the exclusive neighborhood of the same actors on each layer of the network. Given an actor a, the neighborhood of a considers only the distinct actors who are connected to a within the whole multilayer network. As we can observe comparing the second columns of Tables 2 and 3, some actors are connected on multiple layers. These actors will not have a higher neighborhood, but they will have a higher degree. A low exclusive neighborhood on a layer implies that a layer is not important to maintain the actor connectivity. If the Phone Calls layer disappeared, actors such as 51, 48, 64, 12, 25, 63, and 50 would lose 0 neighbors. This layer is not essential for these actors. In the same way, the Crimes layer is not important for the connectivity of actor 47. A highly exclusive neighborhood means instead that a layer is important for an actor. For example, we can observe the case of actor 18. If the Meetings layer disappeared, 14 neighbors of this actor would disappear. If the Phone Calls layer disappeared, 15 neighbors of this actor would disappear. If the Crimes layer disappeared, only 3 neighbors of this actor would disappear. Peculiar are the cases of actors 12 and 25, who are, respectively, a soldier and a caporegime of the Mistretta family. If the Meetings layer disappeared, these actors would lose, respectively, 15 and 12 neighbors. This leads us to deduce that they would lose their central positions and confirms the results of Table 2 in which we can observe a degree equal to 16 for actor 12 and equal to 13 for actor 25 in the Meetings layer. The degree of these actors is 1 in the Phone Calls layer. Actors 12 and 25 are not present on the Crimes layer.  Table 4 shows each layer's relevance and exclusive relevance for the 20 highest-degree actors of the Montagna multiplex network. A high relevance means that a specific actor has a significant presence on a specific layer. The exclusive relevance is similar to the exclusive neighborhood. It allows to know how much the removal of a specific layer would affect the actor connectivity in terms of actor neighbors. Relevance and exclusive relevance should be considered together. For example, the presence of actor 12 on the network is totally based on the Meetings layer, which contains the 100% of its neighbors. Almost 94% of these neighbors are only present on the Meetings layer.

Results
The presence of the key actor 43 (i.e., the Messaggero) in the network is largely based on the Meetings layer. This layer contains in fact 81% of the actor neighbors. More than half of these neighbors are present only in the Meetings layer. The Crimes layer contains 88% of the neighbors of actor 63, while 44% of them are only present on this third layer. The Crimes layer contains 76% of the neighbors of actor 54; 58% of them are only present on this third layer. These results support those obtained during the degree computation. The presence of actor 61 is largely based on the Phone Calls layer, which contains 70% of its neighbors with 54% only present on this layer. We can observe in Table 2 how actor 61 has a higher degree in the Phone Calls layer with respect to the other ones.
The results obtained applying the layer measures on the Montagna multiplex network are shown in Figure 3 and more in detail in Tables 5-8.  Table 5 shows the Jaccard correlation coefficient computed among pairs of layers in the Montagna multiplex network to verify the presence of the same actors on different layers. If two layers do not share any actor, the coefficient will be equal to 0. If two layers share the same actors, the coefficient will be equal to 1. In our case, we can observe the strongest overlapping between the actors in the Meetings and Phone Calls layers, which is slightly lower between the Meetings and Crimes layers, and very low between the Phone Calls and Crimes layers. on the Meetings layer which contains the 100% of its neighbors. Almost the 94% of these 420 neighbors is only present in the Meetings layer.

421
For example, the presence of the key actor 43 (i.e. the Messaggero) in the network 422 is largely based on the Meetings layer. This layer contains in fact the 81% of the actor 423 neighbors. More than half of these neighbors are present only in the Meetings layer. The 424 Crimes layer contains the 88% of the neighbors of actor 63. The 44% of them are only 425 present of this third layer. The Crimes layer contains the 76% of the neighbors of actor 54. 426 The 58% of them are only present of this third layer. These results support those obtained 427 during the degree computation. The presence of actor 61 is largely based on the Phone 428 Calls layer which contains the 70% of its neighbors with the 54% only present of this layer. 429 We can observe in Table 2 how actor 61 has a higher degree in the Phone Calls layer respect 430 to the other ones. The results obtained applying the layer measures on the Montagna multiplex network 432 are shown in Fig. 3 and more in detail in Tables 5, 6, 7 and 8.  The overlapping between edges in a pair of layers can be checked to know if actors are connected to the same other actors on different layers. In this case, if no actors who are connected on a layer are also connected on the other layer, the Jaccard coefficient will be equal to 0. If all pairs of actors are connected in none of layers or in both of them, the Jaccard coefficient will be equal to 1. Table 6 shows that there is no overlapping of the edges in our Montagna multiplex network. This means that in most cases, the actors of our network who represent suspected criminals have different connections among them. When there is an overlapping between the actors in the layer of a multilyer network, it can be interesting to verify if actors have a high or low degree on all the layers in which they are present. In other words, we want to know if these actors behave in a similar way. The Pearson correlation allows us to know this information. Table 7 Table 8 shows the dissimilarity between degree distributions in the form of pair-wise comparisons among the layers of the Montagna multiplex network. The dissimilarity is computed using the Jeffreys dissimilarity function. Two layers of the network are more dissimilar the higher the values of the Jeffreys divergence. In our case, it is possible to observe how the degree distributions of the three layers are quite dissimilar. In particular, the Crimes layer seems to be the most dissimilar from the other two layers, especially with respect to the Phone Calls layer. The degree distributions for each layer are showed as normalized histograms in Figure 4. Most nodes in the Meetings layer have degree k equal to 2. Most nodes in the Phone Calls layer have a degree k equal to 1. Most nodes in the Crimes layer have a degree k equal to 8. The Meetings layer reconstructs the meetings among suspected criminals assuming all of them had interactions with each other [40]. This may have overestimated the real number of connections. In fact, some of the participants in these meetings may have had limited or no interactions. However, it should be noted that LEAs were only able to identify the individuals who attended the meetings and not the full extent of their interactions. The Phone Calls layer represents instead the phone calls between suspects who were intercepted by LEAs. Finally, the Crimes layer represents the individuals who were charged with mafia association crimes. layer have degree k equal to 1. Most nodes in the Crimes layer have degree k equal to 8. 469 The Meetings layer reconstructs the meetings among suspected criminals assuming all of 470 them had interactions with each other [40]. This may have overestimated the real number 471 of connections. In fact, some of the participants in these meetings may have had limited 472 or no interactions. However, it should be noted that LEAs were only able to identify the 473 individuals who attended the meetings and not the full extent of their interactions. The 474 Phone Calls layer represents instead the phone calls between suspects who were intercepted 475 by LEAs. Finally, the Crimes layer represents the individuals who were charged with mafia 476 association crimes.

478
In this paper, we used a real criminal dataset related to an anti-mafia operation 479 known as Montagna, which was concluded in 2007. Parsing a two hundred pages pre-trial 480 detection order by the Court of Messina, we initially built three simple graphs, one for 481 meetings, one for phone calls and one for crimes committed together by the suspected 482 criminals. Some suspects who met and called each other, also committed crimes together. 483 Therefore, we identified meetings, phone calls and crimes as layers of a multiplex network. 484 For this reason, we created a weighted and undirected multilayer network, where edge 485 weights represented the number of meetings, phone calls or crimes. 486 We initially studied the three layer of the Montagna multiplex network separately 487 applying some descriptive measure which are typical of SNA such as the number of 488 connected components, the size of the largest connected component, the density, the 489 average clustering coefficient, the average path length and the diameter. Then, we used 490 actor measures from the multilayer network analysis to study the degree of the the 20 most 491 central actors, their degree deviation, their neighborhood and exclusive neighborhood, 492 their relevance and exclusive relevance. These measures are useful to understand the 493 importance of an actor respect to the whole network and on each specific layer. Finally, we 494 used layer measures from multilayer network analysis to study the overlapping of actors 495 and links, the correlation among the actor degree on each layer and the dissimilarity among 496 the degree distribution on each layer.

497
Our experiments show how traditional measures, which are usually used for simple 498 graphs, such as the Jaccard or Pearson correlation coefficients, allow to obtain significant 499 results even on multilayer networks. Thus, the generalization of traditional network science 500 measures sounds promising for the study of multilayer networks.

501
The study of correlation among the layers of a multilayer network, and therefore 502 the study of the overlapping of actors and edges in it, allows to understand if each layer 503 contains unique information. The lack of overlapping among edges proves that, compared 504

Discussion
In this paper, we used a real criminal dataset related to an anti-mafia operation known as Montagna, which was concluded in 2007. Parsing a two hundred-page pre-trial detection order by the Court of Messina, we initially built three simple graphs, one for meetings, one for phone calls and one for crimes committed together by the suspected criminals. Some suspects who met and called each other also committed crimes together. Therefore, we identified meetings, phone calls and crimes as layers of a multiplex network. For this reason, we created a weighted and undirected multilayer network, where edge weights represented the number of meetings, phone calls or crimes.
We initially studied the three layers of the Montagna multiplex network separately applying some descriptive measures that are typical of SNA such as the number of connected components, the size of the largest connected component, the density, the average clustering coefficient, the average path length and the diameter. Then, we used actor measures from the multilayer network analysis to study the degree of the 20 most central actors, their degree deviation, their neighborhood and exclusive neighborhood, and their relevance and exclusive relevance. These measures are useful to understand the importance of an actor with respect to the whole network and on each specific layer. Finally, we used layer measures from multilayer network analysis to study the overlapping of actors and links, the correlation among the actor degree on each layer and the dissimilarity among the degree distributions on each layer.
Our experiments show how traditional measures, which are usually used for simple graphs, such as the Jaccard or Pearson correlation coefficients, allow obtaining significant results even on multilayer networks. Thus, the generalization of traditional network science measures sounds promising for the study of multilayer networks.
The study of correlation among the layers of a multilayer network, and therefore the study of the overlapping of actors and edges in it, allows us to understand if each layer contains unique information. The lack of overlapping among edges proves that compared with the study of the single layers, the analysis of multiple layers within a criminal network provides a more nuanced understanding of its structure and of the strategic position of actors in it.
A key actor in a certain layer may not be central in any aother layer nor for the multilayer network. The identification of key actors in a criminal network may be of particular salience for law enforcement efforts to reduce the capabilities of a criminal network. LEAs could consider key actors across different layers to make decisions regarding targets for surveillance, intelligence and arrest. For this reason, LEAs should collect multiplex data including data on different types of edges across a criminal network [13].
Analyzing multiplex data, we could also redefine the concept of criminal. The level of criminality of a specific actor could be quantified based on his connections among the layers.
Multiplex data could also be useful to face the missing data problem in criminal networks, which are usually incomplete, incorrect and inconsistent [25]. LEAs may have limited resources or make unintentional errors. Some individuals unrelated to the criminal organization such as relatives, friends or other frequent contacts may appear during the investigations. Moreover, some members of the criminal organization try to avoid detection using intermediaries, coding messages or by refraining from the use of the telephone. In criminal network analysis, missing data can refer to missing nodes and/or missing edges. LEAs plan to get reliable results from the application of link prediction algorithms to address the problem of missing edges, which is a critical impediment to understand network boundaries and topology. Given a multiplex network, edges on a layer could be predicted considering the edges present on another layer. A multiplex link prediction could be even more helpful for researchers and LEAs compared to standard link prediction.

Conclusions
Criminal networks are the result of a large number of different pieces of information. In particular, police stakeout and wiretap records are usually used in conjunction with documents from criminal prosecutions, law enforcement reports, and interviews with suspects. The nodes of a criminal network are the individuals who appear in these documents. Communications, meetings, financial transactions, trade in illicit goods, and exchange of particular resources are modeled using the edges.
Meetings and Phone Calls are two criminal networks based on meetings and phone calls among suspected criminals observed during stakeouts or wiretapped by police during a specific period of time. The Meetings network possesses a greater number of connections because LEAs were only able to identify the participants to meetings and not the full extent of their interactions. In crowded meetings, some participants may have had a very limited (if any) interaction with other participants. In such a case, assuming that all participants interacted with each other may considerably overestimate the real number of connections. This is the reason why the Meetings network is more dense than the Phone Calls network. Moreover, we deal with two criminal networks in which communications are supposed to be reduced to keep the criminal organization safe. If two criminals call each other, it is reasonable to believe that they will not meet. In this paper, we also built a third network that represents the connections between subjects who had committed, in concurrence, crimes for the purpose of the mafia association. Therefore, it is not recommended to build an aggregated network adding fictitious edges and putting on the same plan phone calls, group meetings and associative crimes. A complete picture can only be obtained by considering the three networks as a whole multilayer network.
We applied actor and layer measures on the multilayer network which highlighted the usefulness of the multilayer approach by bringing out the importance of actors that does not emerge by studying the three networks separately. The Montagna operation focused on the Mistretta family and the Batanesi clan, who infiltrated several economic activities including public works on the Tyrrhenian coast and the Nebrodeo territory, through a cartel of entrepreneurs close to the Sicilian Mafia. For this reason, entrepreneurs should have a central role to win public tenders and to accomplish the public contracts in a fraudulent way. Nevertheless, we were not able to identify their importance analyzing the single layers but only considering the multilayer structure. More in general, we demonstrated that a complete picture of the structure and the activities of the criminal organization can be obtained only considering the three layers as a whole multilayer network and not as single-layer networks.
Our results rely on a single case study, which refers to the Sicilian Mafia, but they can be generalized to other form of organized crime. Unfortunately, most of the information about criminal networks is not publicly available, and this leads to small datasets available for analysis. A multilayer approach can be applied in every case in which it is possible to derive different networks representing different information from the judicial documents related to a specific investigation.
As future works, we want to apply link prediction and community detection algorithms to our multiplex network and even try to build a network model that can reproduce a criminal multiplex network. Network models can, in fact, help LEAs to predict and prevent the creation of connections between criminals or to break them by arresting one or more of the suspects. Moreover, we intend to evaluate the possibility to change the Montagna multiplex network into a multilayer network by coupling language networks of the content produced in the phone calls by suspected criminals [49]. Our multiplex network could be also changed into a feature-rich network encapsulating node-level attributes coming from our criminal records and evaluating if the information over nodes is informative about community structure [50]. An actor could in fact be categorized as boss, underboss, consigliere, messaggero, caporegime, soldier and associate according to the hierarchical structure of Mafia families [36].