Mapping higher-order network flows in memory and multilayer networks with Infomap

Comprehending complex systems by simplifying and highlighting important dynamical patterns requires modeling and mapping higher-order network flows. However, complex systems come in many forms and demand a range of representations, including memory and multilayer networks, which in turn call for versatile community-detection algorithms to reveal important modular regularities in the flows. Here we show that various forms of higher-order network flows can be represented in a unified way with networks that distinguish physical nodes for representing a~complex system's objects from state nodes for describing flows between the objects. Moreover, these so-called sparse memory networks allow the information-theoretic community detection method known as the map equation to identify overlapping and nested flow modules in data from a range of~different higher-order interactions such as multistep, multi-source, and temporal data. We derive the map equation applied to sparse memory networks and describe its search algorithm Infomap, which can exploit the flexibility of sparse memory networks. Together they provide a general solution to reveal overlapping modular patterns in higher-order flows through complex systems.


Introduction
To connect structure and dynamics in complex systems, researchers model, for example, people navigating the web [1], rumors wandering around among citizens [2], and passengers traveling through airports [3], as flows on networks with random walkers. Take an air traffic network as an example. In a standard network approach, nodes represent airports, links represent flights, and random walkers moving on the links between the nodes represent passengers. This dynamical process corresponds to a first-order Markov model of network flows: a passenger arriving in an airport will randomly continue to an airport proportional to the air traffic volume to that airport. That means, for example, that two passengers who arrive in Las Vegas, one from San Francisco and one from New York, will have the same probability to fly to New York next. In reality, however, passengers are more likely to return to where they come from [4]. Accordingly, describing network flows with a first-order Markov model suffers from memory loss and washes out significant dynamical patterns [5][6][7] (Figure 1a). Similarly, aggregating flow pathways from multiple sources, such as different airlines or seasons in the air traffic example, into a single network can distort both the topology of the network and the dynamics on the network [8][9][10][11] (Figure 1c,d).
As a consequence, the actual patterns in the data, such as pervasively overlapping modules in air traffic and social networks [12], cannot be identified with community-detection algorithms that operate on first-order network flows [4]. To take advantage of higher-order network flows, researchers have therefore developed different representations, models, and community-detection algorithms that broadly fall into two research topics: memory networks and multilayer networks. In memory networks, higher-order network flows can represent multistep pathways such as flight itineraries [4,[13][14][15], and in multilayer networks, they can represent temporal or multi-source data such as multiseason air traffic [8,11,16,17] (Figure 1b). Whereas memory networks can capture where flows move to depending on where they come from (Figure 1e), multilayer networks can capture where flows The system represented as an undirected network with nodes for the objects. The links show where flows coming in to the center node are constrained to go next; (e) The system represented as a memory network with physical nodes for the objects and state nodes for constraining flows along their links. The links show where flows coming in to the center node are constrained to go next depending on where they come from; (f) The system represented as a multilayer network with physical nodes for the objects and state nodes in layers corresponding to different data sources. The links show where the flows coming in to the center node are constrained to go next depending on which layer they are in.
Here we show that describing higher-order network flows with so-called sparse memory networks and identifying modules with long flow persistence times, such as groups of airports that contain frequently traveled routes, provides a general solution to reveal modular patterns in higher-order flows through complex systems. When modeling flows in conventional networks, a single node type both represents a complex system's objects and describes the flows with the nodes' links ( Figure 1c,d). Sparse memory networks, however, discriminate physical nodes, which represent a complex system's objects, from state nodes, which describe a complex system's internal flows with their links (Figure 1e,f). In sparse memory networks, state nodes are not bound to represent, for example, previous steps in memory networks or layers in multilayer networks, but are free to represent abstract states such as lumped states [18] or states in multilayer memory networks, which we demonstrate with multistep and multiquarter air traffic data. In this way, a sparse memory network is a MultiAspect Graph with two aspects: the physical object and the flow state such as memory or layer [17]. We show that various higher-order network flow representations, including memory and multilayer networks, can be represented with sparse memory networks. We also provide a detailed derivation of the information-theoretic map equation for identifying hierarchically nested modules with long flow persistence times in sparse memory networks, and introduce a new version of the map equation's search algorithm Infomap that exploits the flexibility of sparse memory networks.

Modeling Network Flows
The dynamics and function of complex systems emerge from interdependences between their components [19][20][21]. Depending on the system under study, the components can be, for example, people, airports, hospital wards, and banks. The interdependence, in turn, often comes from flows of some entity between the components, such as ideas circulating among colleagues, passengers traveling through airports, patients moving between hospital wards, or money transferred between banks. To efficiently capture such flows through complex systems, researchers model them with random walks on networks.

First-Order Network Flows
In a first-order network representation, the flow direction only depends on the previously visited physical node. That is, for a random walker that moves between physical nodes i ∈ {1, 2, . . . , N P }, which represent objects in a complex system, and in t steps generates a random variable sequence X 1 , X 2 , . . . , X t , the transition probabilities only depend on the previously visited node's outlinks. With link weights w ij between physical nodes i and j, and total outlink weights w i = ∑ j w ij of node i, the first-order transition probabilities are which give the stationary visit rates To ensure ergodic stationary visit rates, from each node we can let the random walker teleport with probability τ, or with probability 1 if the node has no outlinks, to a random target node proportional to the target node's inlink weight [22].
While a first-order model is sufficient to capture flow dynamics in some systems, recent studies have shown that higher-order flows are required to capture important dynamics in many complex systems [5,6,23]. The standard approach of modeling dynamical processes on networks with first-order flows oversimplifies the real dynamics and sets a limit of what can actually be detected in the system ( Figure 1). Capturing critical phenomena in the dynamics and function of complex systems therefore often requires models of higher-order network flows [5][6][7][23][24][25][26][27][28].

Higher-Order Network Flows
In higher-order network representations, more information than the previously visited physical node's outlinks are used to determine where flows go. Examples of higher-order network representations are memory, multilayer, and temporal networks.
In memory networks, the flow direction depends on multiple previously visited nodes. Specifically, for a random walker that steps between physical nodes and generates a sequence of random variables X 1 , . . . , X t , the transition probabilities for a higher-order flow model of order m, depend on the m previously visited physical nodes. Assuming stationarity and m visited nodes from x −m m steps ago to previously visited i in sequence # " x −m x −m+1 · · · x −2 i, the mth-order transition probabilities between physical nodes i and j correspond to first-order transition probabilities P( We use subscript i in α i to highlight the state node's physical node. With link weights w α i β j between state nodes α i and β j , and total outlink weights w α i = ∑ β j w α i β j , the transition probabilities for memory networks are As in a first-order network representation, teleportation can ensure ergodic stationary visit rates In multilayer networks with the same physical nodes possibly present in multiple layers, the flow direction depends on both the previously visited physical node i and layer α ∈ {1, 2, . . . , l}. Similar to the memory representation, the multilayer transition probabilities between physical nodes i in layer α and j in layer β correspond to first-order transition probabilities between states nodes α i and β j .
In many cases, empirical interaction data only contain links within layers such that w α i β j = 0 only for α = β. Without data with links between layers, it is useful to couple layers by allowing a random walker to move between layers at a relax rate r. With probability 1 − r, a random walker follows a link of the physical node in the currently visited layer, and with probability r the random walker relaxes the layer constraint and follows any link of the physical node in any layer. In both cases, the random walker follows a link proportional to its weight. With total outlink weight w α i = ∑ β j w α i β j from physical node i in layer α, and total outlink weight w i = ∑ α,β j w α i β j from physical node i across all layers, which both correspond to total intralayer outlink weights as long as there are only empirical links within layers, the transition probabilities for multilayer networks with modeled interlayer transitions are where δ αβ is the Kronecker delta, which is 1 if α = β and 0 otherwise. In this way, the transition probabilities capture completely separated layers for relax rate r = 0 and completely aggregated layers for relax rate r = 1. The system under study and problem at hand should determine the relax rate. In practice, for relax rates in a wide range around 0.25, results are robust in many real multilayer networks [11]. With both intralayer and interlayer links such that w α i β j = 0 also for α = β, either from empirical data or modeled interlayer links P α i β j (r) → w α i β j , the transition probabilities for multilayer networks can be written in their most general form, where w α i again is the total outlink weight from node i in layer β. For directed multilayer networks, teleportation can ensure ergodic stationary visit rates π α i .

Sparse Memory Networks
While memory networks and multilayer networks operate on different higher-order interaction data, the resemblance between Equations (5) and (7) suggests that they are two examples of a more general network representation. In a memory network model, a random walker steps between physical nodes such that the next step depends on the previous steps (Equation (5)). In a multilayer network model, instead the next step depends on the visited layer (Equation (7)). However, as Equations (5) and (7) show, both models correspond to first-order transitions between state nodes α i and β j associated with physical nodes i and j in the network, Consequently, the state node visit rates are where the sum is over all state nodes in all physical nodes. The physical node visit rates are where the sum is over all state nodes in physical node i. Both memory and multilayer networks can be represented with a network of physical nodes and state nodes that neither are bound to represent previous steps nor current layer. Because state nodes are free to represent abstract states, and redundant state nodes can be lumped together, we call this representation a sparse memory network.

Representing Memory and Multilayer Networks with Sparse Memory Network
To illustrate that sparse memory networks can represent both memory and multilayer networks, we use a schematic network with five physical nodes ( Figure 2a). The network represents five individuals, with the center node's two friends to the left and two colleagues to the right. We imagine that the multistep pathway data come from two sources, say a Facebook conversation thread among the friends illustrated with the top sequence above the network and the solid pathway on the network, and an email conversation thread among the colleagues illustrated with the bottom sequence and the dashed pathway. For simplicity, the pathway among the friends stays among the friends and steps between friends with equal probability. The pathway among the colleagues behaves in corresponding way. We first represent these pathway data with a memory and a multilayer network, and then show that both can be represented with a sparse memory network.
We can represent multistep pathways with links between state nodes in physical nodes. For a memory network representation of a second-order Markov model, each state node captures which physical node the flows come from. For example, the highlighted pathway step in Figure 2a corresponds to a link between state node ε i in physical node i-capturing flows coming to physical node i from physical node j-and state node γ k in physical node k-capturing flows coming to physical node k from physical node i ( Figure 2b). In this way, random walker movements between state nodes can capture higher-order network flows between observable physical nodes.
We can represent multistep pathways with links between state nodes in layers. First, we can map all state nodes that represent flows coming from a specific physical node onto the same layer. For example, we map red state nodes ε i and ε k for flows coming from red physical node j to physical nodes i and k, respectively, in Figure 2b onto the red layer ε at the bottom in Figure 2c. This mapping gives one-to-one correspondence between the memory network and the multilayer network. Therefore we call it a multilayer memory network. An alternative and more standard multilayer representation exploits that the pathway data come from two sources and uses one layer for each data source ( Figure 2d). Network flows are first-order when constrained to move within individual layers and higher-order when free to move within and between layers. The highlighted pathway step in Figure 2a now corresponds to a step in layer η between state node η i and state node η k -capturing flows remaining among friends. Again, random walker movements between state nodes can capture higher-order network flows between observable physical nodes. In our multistep pathway example from two sources, a full second-order model in the memory network is not required and partly redundant to describe the network flows. We can model the same pathways with a more compact description. Specifically, we can lump together state nodes α i and β i in the same physical node i if they have identical outlinks, w α i γ j = w β i γ j for all γ j . The lumped state nodeα i replaces α i and β i and assembles all their inlinks and outlinks such that the transition probabilities remain the same. For example, for describing where flows move from physical node i, state nodes ε i and δ i have identical outlinks-reflecting that it does not matter which friend was previously active in the conversation-as well as state nodes β i and α i -reflecting that it does not matter which colleague was previously active in the conversation. Lumping together all such redundant state nodes, such as ε i , δ i →α i or β i , α i →δ i , gives the sparse memory network in Figure 2e, where state nodes no longer are bound to capture the exact previous steps but still capture the same dynamics as the redundant full second-order model. These unbound state nodes are free to represent abstract states, such as lumped state nodes in a second-order memory network, but can also represent state nodes in a full second-order memory network or state nodes in a multilayer network such as η i →α i in Figure 2d and e. For example, lumping state nodes with minimal information loss can balance under-and overfitting for efficient sparse memory networks that represent variable-order network flow models [18]. Consequently, rather than an explosion of application and data dependent representations, sparse memory networks with physical nodes and state nodes that can represent abstract states provides an efficient solution for modeling higher-order network flows.

Mapping Network Flows
While networks and higher-order models make it possible to describe flows through complex systems, they remain highly complex even when abstracted to nodes and links. Thousands or millions of nodes and links can bury valuable information. To reveal this information, many times it is indispensable to comprehend the organization of large complex systems by assigning nodes into modules with community-detection algorithms. Here we show how the community-detection method known as the map equation can operate on sparse memory networks and allow for versatile mapping of network flows from multistep, multi-source, and temporal data.

The Map Equation for First-Order Network Flows
When simplifying and highlighting network flows with possibly nested modules, the map equation measures how well a modular description compresses the flows [29]. Because compressing data is dual to finding regularities in the data [30], minimizing the modular description length of network flows is dual to finding modular regularities in the network flows. For describing movements within and between modules, the map equation uses code books that connect node visits, module exits, and module entries with code words. To estimate the shortest average description length of each code book, the map equation takes advantage of Shannon's source coding theorem and measures the Shannon entropy of the code word use rates [30]. Moreover, the map equation uses a hierarchically nested code structure designed such that the description can be compressed if the network has modules in which a random walker tends to stay for a long time. Therefore, with a random walker as a proxy for real flows, minimizing the map equation over all possible network clusterings reveals important modular regularities in network flows.
In detail, the map equation measures the minimum average description length for a multilevel map M of N physical nodes clustered into M modules, for which each module m has a submap M m with M m submodules, for which each submodule mn has a submap M mn with M mn submodules, and so on (  (Figure 3a), (11) and the total code word use rate for also visiting nodes in a module is such that the average code word length is Weighting the average code word length of the code book for module mn . . . o at the finest level by its use rate gives the contribution to the description length, In each submodule m at intermediate levels, the code word use rate for exiting to a coarser level is and for entering the M m submodules M mn at a finer level is (Figure 3c).
Therefore, the total code rate use in submodule m is which gives the average code word length Weighting the average code word length of the code book for module m at intermediate levels by its use rate, and adding the description lengths of submodules at finer levels in a recursive fashion down to the finest level in Equation (14), gives the contribution to the description length, At the coarsest level, there is no coarser level to exit to, and the code word use rate for entering the M submodules M mn at a finer level is (Figure 3d), (20) such that the total code rate use at the coarsest level is which gives the average code word length Weighting the average code word length of the code book at the coarsest level by its use rate, and adding the description lengths of submodules at finer levels from Equation (19) in a recursive fashion, gives the multilevel map equation [31]  To find the multilevel map that best represents flows in a network, we seek the multilevel clustering of the network that minimizes the multilevel map equation over all possible multilevel clusterings of the network (see Algorithm 1 in Section 4).
While there are several advantages with the multilevel description, including potentially better compression and effectively eliminated resolution limit [32], for simplicity researchers often choose two-level descriptions. In this case, there are no intermediate submodules and the two-level map equation is

The Map Equation for Higher-Order Network Flows
The map equation for first-order network flows measures the description length of a random walker stepping between physical nodes within and between modules. This principle remains the same also for higher-order network flows, although higher-order models guide the random walker between physical nodes with the help of state nodes. Therefore, extending the map equation to higher-order network flows, including those described by memory, multilayer, and sparse memory networks, is straightforward. Equations (11) to (24) remain the same with i → α i and j → β j . The only difference is at the finest level (Figure 3b). State nodes of the same physical node assigned to the same module should share code word, or they would not represent the same object. That is, if multiple state nodes β j of the same physical node i are assigned to the same module mn . . . o, we first sum their probabilities to obtain the visit rate of physical node i in module mn . . . o, (Figure 3b). (25) In this way, the frequency weighted average code word length in submodule codebook mn . . . o in Equation (13) where the sum is over all physical nodes that have state nodes assigned to module M mn...o . In this way, the map equation can measure the modular description length of state-node-guided higher-order flows between physical nodes. To illustrate that the separation between physical nodes and state nodes matters when clustering higher-order network flows, we cluster the red state nodes and the dashed blue state nodes in Figure 4 in two different modules overlapping in the center physical node. For a more illustrative example, we also allow transitions between red and blue nodes at rate r/2 in the center physical node. In the memory and sparse memory networks, the transitions correspond to links from state nodes in physical node i to state nodes in the other module with relative weight r/2 and to the same module with relative weight 1 − r/2 (Figurs 4b,c). In the multilayer network, the transitions correspond to relax rate r according to Equation (6), since relaxing to any layer in physical node i with equal link weights in both layers means that one half of relaxed flows switch layer (Figure 4a). Independent of the relax rate, in these symmetric networks the node visit rates are uniformly distributed: π α i = 1/6 for each of the six state nodes in the multilayer and sparse memory networks, and π α i = 1/12 for each of the twelve state nodes in the memory network. For illustration, if we incorrectly treat state nodes as physical nodes in the map equation, Equations (11) to (24), the one-module clustering of the memory network with twelve virtual physical nodes, M   Therefore, the two-module clustering gives best compression for all relax rates (Figure 4e). For the sparse memory network with six virtual physical nodes, the one-module clustering, M While the two-module clustering again gives best compression for all relax rates, the code lengths are shifted by 1 bit compared with the memory network with twelve virtual physical nodes (Figure 4e). Same dynamics but different code length. These expanded solutions with virtual physical nodes do not capture the important and special role that physical nodes play as representatives of a system's objects.
Properly separating state nodes and physical nodes as in Equation (26) instead gives equal code lengths for identical clusterings irrespective of representation. For example, the one-module clustering of the memory network with twelve state nodes, M  That is, same dynamics give same code length for identical clusterings with proper separation of state nodes and physical nodes.
For these solutions with physical nodes and state nodes that properly capture higher-order network flows, the overlapping two-module clustering gives best compression with relax rate r 0.71 (Figure 4e). In this example, the one-module clustering can for sufficiently high relax rate better compress the network flows than the two-module clustering. Compared with the expanded clusterings with virtual physical nodes where this cannot happen, the one-module clustering gives a relatively shorter code length thanks to code word sharing between state nodes in physical node i. In general, modeling higher-order network flows with physical nodes and state nodes, and accounting for them when mapping the network flows, gives overlapping modular clusterings that do not depend on the particular representation but only on the actual dynamical patterns.

Infomap
To find the best modular description of network flows according to the map equation, we have developed the stochastic and fast search algorithm called Infomap. Infomap can operate on standard, multilayer, memory, and sparse memory networks with unweighted or weighted and undirected or directed links, and identify two-level or multilevel and non-overlapping or overlapping clusterings. Infomap takes advantage of parallelization with OpenMP and there is also a distributed version implemented with GraphLab PowerGraph for standard networks [33]. In principle, the search algorithm can optimize other objectives than the map equation to find other types of community structures.
Recent versions of Infomap operate on physical nodes and state nodes. Each state node has a unique state id and is assigned to a physical id, which can be the same for many state nodes. Infomap only uses physical nodes and state nodes for higher-order network flow representations, such as multilayer, memory, and sparse memory networks, and physical nodes alone when they are sufficient to represent first-order network flows in standard networks.
To balance accuracy and speed, Infomap uses some repeatedly and recursively applied stochastic subroutine algorithms. For example, the multilevel clustering (Algorithm 1) and two-level clustering (Algorithm 2) algorithms first repeatedly aggregate nodes in modules (Algorithm 3) to optimize the map equation (Algorithm 4) and then repeatedly and recursively fine-tune (Algorithm 5) and coarse-tune (Algorithm 6) the results. Together with complete restarts, these tuning algorithms avoid local minima and improve accuracy.
By default, Infomap tries to find a multilevel clustering of a network (Algorithm 1). The complete two-level clustering algorithm improves the repeated node aggregation algorithm (Algorithm 3) by breaking up modules of sub-optimally aggregated nodes. That is, once the algorithm assigns nodes to the same module, they are forced to move jointly when Infomap rebuilds the network, and what was an optimal move early in the algorithm might have the opposite effect later in the algorithm. Similarly, two or more modules that merge together and form a single module when the algorithm rebuilds the network can never be separated again in the repeated node aggregation algorithm. Therefore, the accuracy can be improved by breaking up modules of the final state of the repeated node aggregation algorithm and trying to move individual nodes or smaller submodules into new or neighboring modules. The two-level clustering algorithm (Algorithm 2) performs this refinement by iterating fine-tuning (Algorithm 5) and coarse-tuning (Algorithm 6).

Repeated Node Aggregation
The repeated node aggregation in Algorithm 3 follows closely the machinery of the Louvain method [34]. While the Louvain method seeks to maximize the modularity score, which fundamentally operates on link density rather than network flows, its repeated node aggregation machinery is effective also for optimizing other objectives than the modularity score.
Infomap aggregates neighboring nodes into modules, subsequently aggregates them into larger modules, and repeats. First, Infomap assigns each node to its own module. Then, in random sequential order, it tries to move each node to the neighboring or a new module that gives the largest code length decrease (Algorithm 4). If no move gives a code length decrease, the node stays in its original module. The change of code length for each possible move can be calculated locally based only on the change in enter and exit flows, the current flow in the moving node and affected modules, and the initial enter flow for the module. For state nodes, the physical node visit rates can be locally updated.
Infomap repeats this procedure, each time in a new random sequential order, until no move happens. Then Infomap rebuilds the network with the modules of the last level forming the nodes at the new level, and, exactly as in the previous level, aggregates the nodes into modules, which replace the existing ones. Infomap repeats this hierarchical network rebuilding until the map equation cannot be further reduced.
The computational complexity of the repeated node aggregation is considered to be O(N log N) for the Louvain method applied to networks with N nodes [34]. While Infomap optimizes the map equation with some more costly logarithms, there is no fundamental difference in the scaling. However, for sparse memory networks, the scaling applies to the number of state nodes. In any case, the first aggregation step is the most expensive because Infomap considers each node and all its links such that the complexity scales with the number of links in the network. To further optimize the map equation, Infomap performs stochastic tuning steps with more network dependent computational complexity.

Fine-Tuning
In the fine-tuning step, Infomap tries to move single nodes out from existing modules into new or neighboring modules (Algorithm 5). First, Infomap reassigns each node to be the sole member of its own module to allow for single-node movements. Then it moves all nodes back to their respective modules of the previous step. At this stage, with the same clustering as in the previous step but with each node free to move between the modules, Infomap reapplies the repeated node aggregation (Algorithm 3).

Coarse-Tuning
In the course-tuning step, Infomap tries to move submodules out from existing modules into new or neighboring modules (Algorithm 6). First, Infomap treats each module as a network on its own and applies repeated node aggregation (Algorithm 3) on this network. This procedure generates one or more submodules for each module. Infomap then replaces the modules by the submodules and moves the submodules into their modules as an initial solution. At this stage, with the same clustering as in the previous step but with each submodule free to move between the modules, Infomap reapplies the repeated node aggregation (Algorithm 3).

Multilevel Clustering
Infomap identifies multilevel clusterings by extending two-level clusterings both by iteratively finding supermodules of modules and by recursively finding submodules of modules. To find the optimal multilevel solution, Infomap first tries to find an optimal two-level clustering, then iteratively tries to find supermodules and then recursively tries to cluster those modules until the code length cannot be further improved (Algorithm 1).  To identify a hierarchy of supermodules from a two-level clustering, Infomap first tries to find a shorter description of flows between modules. It iteratively runs the two-level clustering algorithm (Algorithm 2) on a network with one node for each module at the coarsest level, node-visit rates from the module-entry rates, and links that describe aggregated flows between the modules (Algorithm 7). For each such step, a two-level code book replaces the coarsest code book. In this way, describing entries into, within, and out from supermodules at a new coarsest level replaces only describing entries into the previously coarsest modules. For a fast multilevel clustering, Infomap can keep this hierarchy of supermodules and try to recursively cluster the bottom modules into submodules with Algorithm 8. However, as this hierarchy of supermodules is constrained by the initial optimal two-level clustering, discarding everything but the top supermodules and starting the recursive submodule clustering from them often identifies a better multilevel clustering. To find submodules, Infomap runs the two-level clustering (Algorithm 2) followed by the super-module clustering (Algorithm 7) algorithm on the network within each module to find the largest submodule structures. If Infomap finds non-trivial submodules (that is, not one submodule for all nodes and not one submodule for each node), it replaces the module with the submodules and collects the submodules in a queue to explore each submodule in parallel for each hierarchical level. In this way, Infomap can explore the submodule hierarchy in parallel with a breadth-first search to efficiently find a multilevel clustering of the input network.

Mapping Multistep and Multi-Source Data with Infomap
We illustrate our approach by mapping network flows from air traffic data between more than 400 airports in the US [35]. For quarterly reported itineraries in 2011, we assembled pathways into four quarterly, two half-year, and one full-year collection. From pathways of length two, three, and four, for each collection we generated sparse memory networks corresponding to first-, second-, and third-order Markov models of flows. Then we generated sparse memory networks corresponding to multilayer networks with four layers from the quarterly collections and two layers from the half-year collections for all Markov orders. This example illustrates how sparse memory networks can go beyond standard representations and represent combinations of multilayer and memory networks.
To identify communities in multistep and multi-source air traffic data represented with sparse memory networks, we run Infomap ten times and report the median values in Table 1. Except for a small drop in the number of physical nodes for the third-order models because some airports are not represented in long itineraries, the number of physical nodes remain the same for all models. However, the number of state nodes and links between them increases with order. Larger systems require state-node lumping based on minimum information loss into more efficient sparse memory networks that correspond to variable-order Markov models. Such more compact descriptions of higher-order can balance over-and underfitting as well as cut the computational time [18]. In any case, higher-order representations can better capture constraints on flows. For example, a drop in the entropy rate by one bit corresponds to doubled flow prediction, and the four-bit drop between the first-and third-order models corresponds to a sixteenfold higher flow prediction accuracy. These flows move with relatively long persistence times among groups of airports that form connected legs in frequent itineraries. Infomap capitalizes on this modular flow pattern and identifies pervasively overlapping modules. Table 1. Mapping higher-order network flows from air traffic data. Sparse memory networks corresponding to Markov order m =1, 2, and 3 in one layer with data from quarters 1 + 2 + 3 + 4, in two layers with data from quarters 1 + 2 and 3 + 4, and in four layers with data from quarters 1, 2, 3, and 4, respectively. We report the number of physical node N P , state nodes N S , and links N L . The entropy rate H measures the number of bits required to predict where flows are going. D is the node weighted average depth of nodes in the multilevel clustering. The perplexity P of the module size distribution measures the effective number of modules, and the physical node weighted average of the perplexity of module assignments. A is the effective number of module assignments per physical node. Finally, the time corresponds to runs without parallelization on a 3.60 GHz desktop computer. We use relax rate r = 0. 25

Conclusions
The map equation applied to sparse memory networks provides a general solution to reveal modular patterns in higher-order flows through complex systems.
Rather than multiple community-detections algorithms for a range of network representations, the map equation's search algorithm Infomap can be applied to general higher-order network flows described by sparse memory networks. A sparse memory network uses abstract state nodes to describe higher-order dynamics and physical nodes to represent a system's objects. This distinction makes all the difference. The flexible sparse memory network can efficiently describe higher-order network flows of various types such as flows in memory and multilayer networks or their combinations. Simplifying and highlighting important patterns in these flows with the map equation and Infomap open for more effective analysis of complex systems.