Fuzzy Random Walkers with Second Order Bounds: An Asymmetric Analysis

: Edge-fuzzy graphs constitute an essential modeling paradigm across a broad spectrum of domains ranging from artiﬁcial intelligence to computational neuroscience and social network analysis. Under this model, fundamental graph properties such as edge length and graph diameter become stochastic and as such they are consequently expressed in probabilistic terms. Thus, algorithms for fuzzy graph analysis must rely on non-deterministic design principles. One such principle is Random Walker, which is based on a virtual entity and selects either edges or, like in this case, vertices of a fuzzy graph to visit. This allows the estimation of global graph properties through a long sequence of local decisions, making it a viable strategy candidate for graph processing software relying on native graph databases such as Neo4j. As a concrete example, Chebyshev Walktrap, a heuristic fuzzy community discovery algorithm relying on second order statistics and on the teleportation of the Random Walker, is proposed and its performance, expressed in terms of community coherence and number of vertex visits, is compared to the previously proposed algorithms of Markov Walktrap, Fuzzy Walktrap, and Fuzzy Newman–Girvan. In order to facilitate this comparison, a metric based on the asymmetric metrics of Tversky index and Kullback–Leibler divergence is used.


Introduction
The Random Walker principle is the algorithmic cornerstone for building a number of heuristics for large graphs, namely for those with the fundamental property that neither their vertex nor their edge set fits in main memory.Such heuristics are efficient in terms either of computation time or memory requirements or often both.Under this principle, a virtual entity usually named the Random Walker visits the vertices.Within the scope of this article, the probabilistic strategy followed by the Random Walker to decide which vertex will visit next is of paramount importance, although, depending on the problem under study, other properties of the Random Walker may be of interest.
Virtual or ideal entities play an important role in science and engineering, mainly as a means to prove a theorem, to establish ideal performance limits, and to provide grounds for rejecting a conjecture based on a reductio ad absurdum methodology.Consider, for instance, the particle sorting demon of Maxwell [1,2] with its connections to algorithmic information theory and the steam engine of Heron of Alexandria [3].In addition, the Random Walker principle itself has been applied to a number of graph analytics such as vertex similarity [4] and graph cuts [5] as well as to image processing [6].

|•|
Set cardinality or path length (depending on the context) (e 1 , . . . ,e m ) Path comprised of edges e 1 , . . . ,e m K n Complete graph with n vertices and ( n 2 ) edges Table 1.Cont.

E [X]
Mean value of random variable X Var [X] Variance of random variable X τ T,V Tanimoto similarity coefficient for sets T and V ν T,V Asymmetric Tversky index for sets T and V S 1 \ S 2 Asymmetric set difference S 1 minus S 2 S Fuzzy set S s k Sequence of elements s k H (s 1 , . . . ,s n ) Harmonic mean of elements s 1 , . . ., s n H (s 1 , . . . ,s n ; τ 0 ) Thresholded or effective harmonic mean of s 1 , . . . ,s n 1 n n × 1 vector with ones e k n n × 1 zero vector with a single one at the k-th entry f (n) (x) n-th order derivative of f (x) p || q Kullback-Leibler divergence between distributions p and q
Traditionally, from an algorithmic viewpoint, analytics include structural [17,18] and spectral [19,20] partitioning, where a graph is split according to some functional constraints such as flow or edge density.Efficient information diffusion in large graphs is also of interest [21,22], especially for online political campaigns and digital marketing.The Random Walker principle has been also applied to two other important metrics, namely vertex similarity [4] and heuristic minimum cuts [5].Both metrics can also be treated deterministically, especially in the context of social network analysis [23,24].Community structure discovery provides insight to the inner workings of a particular graph [7,8,17], while metrics such as those in [25] control the discovery process quality.Persistent graphs can be instrumental in designing rollback capabilities in graph databases [26].
Among the numerous applications of graphs or linked data, one can find Web searching and ranking [27] with established algorithms such as PageRank [28] and HITS [29].Bibliometric and scientometric data analysis [30] can boost collaboration between researchers, while image segmentation [6] is central to computer vision and robotics.Social network analysis has greatly benefited from structural [31] or functional [32,33] community detection algorithms.Additionally, influence and perceived social status in online social media have been tied to the participation in communities [34].Message diffusion within a social graph is studied [34,35], while [36][37][38] deal with emotional modeling with respect to user influence [36].Finally, random walkers have served as models for the propagation of computer viruses both in single systems and in networks, including LANs and the Internet [39,40].Within this context, the strategy or the mix of strategies followed by the random walker is of paramount importance as it affects the entity and number of resources susceptible to infection.
First and second order statistics are used across a number of fields.In [41], a channel estimation methodology based on first order statistics is proposed.Methods for blind source separation using second order statistics include [42,43].A comprehensive approach about the applications of higher order methods is given in [44] signal processing and in [45] for biomedical engineering.In [46], a third order method was presented for adaptively scheduling biosignal processing applications at the operating system level.Independent Component Analysis (ICA), a powerful signal processing technique, is based on higher order spectra [47].Among the multitude of ICA applications is source separation in EEG waveforms [48].

Definitions
Within the scope of this paper, the edge-fuzzy graphs are probabilistic and combinatorial hybrids comprised of a fixed set of vertices V and a fuzzy set of edges Ẽ. Formally, Definition 1.A homogeneous edge-fuzzy graph is the ordered triplet where V = {v k } is the set of vertices, Ẽ = {e k } ⊆ V × V is the fuzzy set of edges, and h is the edge membership function h : E → (0, 1], which quantifies the degree of participation of each e k to G [10].Moreover: • the vertex set V is fixed, namely they belong to G with probability one, • the distribution h is the same for each edge, • the existence probability of e k is drawn independently for each edge. A subtle point is that h does not affect the structural properties of the graph in the sense that no edges are added or deleted, except when for a particular e k holds that h(e k ) = 0.If h is continuous, the probability of this ocurring is zero.However, if h is discrete, then depending on h a potentially non-negligible portion of the edges may be deleted.In this article, h was chosen so that only at most an exponentially small proportion of the edges would be assigned to a zero weight.Consequently, the underlying graph preserved its original structure along with any associated connectivity patterns.Otherwise, if a considerable fraction of edges were to be deleted, then the resulting graph would behave more like an Erdös-Rényi graph.The latter are known to be easily constructed by randomly sampling a graph space or, equivalently, the edges of K n but their properties deviate in a significant way from those of real world, large graphs.
Observe that there is no fuzziness whatsoever regarding vertices as by definition they always exist with probability one.In scientific literature, the existence of fuzzy graph classes is prominent.Vertices are fuzzy as well as their fuzziness interacts with that of the edges, mostly by long product chains.Such graphs are beyond the scope of this article.Definition 2. Under the fuzzy graph model of Defintion 1, the cost δ(e k ) of traversing e k is which expresses the intuitive requirement that edges which are less likely to belong to the graph are also harder to cross.
Depending on the application, δ may well be connected through another non-linear transform to h as long as edges with high h(e k ) are easy to cross and edges with low h are difficult to cross such as Definition 3. The cost ∆ p j of a fuzzy path p j = (e 1 , . . . ,e m ) is the sum of the cost of its individual edges where H (h(e 1 ), . . . ,h(e m )) is the harmonic mean of h(e k ) defined as By construction ∆ p j is bounded as follows Whether the above bounds are loose depends on the variability of the actual values h(e k ).That is, if the latter are drawn from a distribution which favors extreme values, ∆ p j will tend to be close to these bounds.It should also be noted that the variance ∆ p j is strongly dependent on that of h(e k ).Moreover, ∆ p j is prone to outliers, which might lead to an unrealistically high average fuzzy path length.This can be remedied by taking into account the variance of the fuzzy path length.Finally, a low ∆ p j tends to contain almost exclusively low edge costs δ(e k ) or, equivalently, edges with high probability of belonging to the graph, an argument which agrees with the weak law of large numbers.In turn, this suggests the intuitive corollary that low cost paths are comprised almost exclusively of edges that are less likely to be fuzzy.This corollary can be used in order to design efficient hybrid probabilistic and combinatorial algorithms based on dynamic programming for finding and enumerating low cost paths akin to the way a similar observation has led to the development of shortest paths relying on dynamic programming in deterministic graphs.
From a probabilistic viewpoint, the sum ∑ m k=1 δ(e k ) is interesting by itself as a finite but possibly large sum of inverse random variables.Notice that the central limit theorem may not be applied in such a setting since the variance of the h(e k ) might be infinite.If this is not the case, different bounds can be computed depending on the distribution of h(e k ) such as the abovementioned central limit theorem, a Poisson bound, a power law bound, or finally approximations based on Markov or Chebyshev inequalities, the Chernoff bound, or on the Gnedenko extreme value theorem.Notice that the effect of a single edge which is exceedingly difficult to cross can be instrumental in shaping graph communities.
The numerical properties of the above sum are also of interest.As values of possibly uneven orders of magnitude may be added, catastrophic cancellation may occur resulting in the loss of meaningful information.This might happen if the summation is executed in an order left to the implementation.On the other hand, the summation order dictated by the Priest algorithm [49] results in the least possible loss of significant decimal digits by adding only numbers of comparable magnitude.Another option would be to substitute the harmonic mean with its thresholded counterpart For other uses of the thresholded harmonic and geometric means, see [50], while, for the effect of finite precision arithmetic to long biosignals, see [51].
An alternative for long paths would be to substitute, under certain conditions, the finite sum with an appropriate integral.Assuming with no loss of generality that h(e 1 ) = min 1≤k≤m {h(e k )} = 0 and h(e m ) = max 1≤k≤m {h(e k )} = 0, then h(e 1 ) where ρ 0 is an optional correction factor.A finer approach requiring more probabilistic information about the longer paths of a given graph would be to partition such a path p j so that Selecting n and forming the sets {h(e u i )} n i=1 , h e i n i=1 , and {ρ i } n i=1 is not a trivial task.Instead, choosing such an approach might be a viable solution only for certain combinations of h and p j .Techniques for estimating the variability as well as the cardinality of large sets such as [52] can be useful while pursuing this approach.
It should be emphasized that the class of fuzzy graphs of Definition 1 can be well considered as a typical example of higher order data.This is attributed to the inherently distributed way information is stored in a graph, in this particular case as edge existence probabilities.In order for meaningful information regarding path costs to be mined, a non-negligible fraction of edges must be crossed and, thus, the interplay of a number of edges must be considered.

Reciprocal Random Variables
Because of the Definitions 2 and 3 for δ(e k ) and ∆ p j , respectively, the properties of an inverse random variable gain more interest.The following definition is straightforward.Definition 4. The inverse distribution of a mass distribution function of a random variable X is defined as the mass distribution function of 1 X [53].
Property 1.In the continuous case, the distributions of X and 1 X are linked as Proof.The cumulative distribution of 1 X is defined as By differentiating the last relationship, the stated result follows.
For instance, if X is the continuous uniform random variable in [α 1 , α 2 ], where α 1 , Despite its simplicity, in certain scenaria, relationship (10) cannot be used.For instance, only the first moments of X may be known or y = 0 might be a legitimate value, in which case there is a singularity in the inversion of f X (x).Instead, bounds are sought for the first moments of 1 X , which makes more sense from a programming viewpoint in the case of large graphs.
Jensen inequality provides a straightforward way to bound the expected value E 1 X by using the expected value E [X] of the non-zero random variable X. Theorem 1. (Jensen inequality) For any random variable and any convex function g(x) provided that both the domains of X and E [X] are subsets of the domain of g(•) Corollary 1.The mean value of the strictly positive random variable 1 X has a lower bound of Property 2. Function g(x) = 1 x is convex when x > 0.
Proof.Notice that the second derivative g (2) (x) = 2 x 3 is positive when x is positive.An alternative way to prove this claim is to apply the standard convexity definition.For every α 0 ∈ [0, 1], x 1 > 0, and In order to derive realistic upper bounds for the path lengths of a given fuzzy graph, certain probabilistic inequalities can be employed.The first is Markov inequality, which establishes a first order bound for the probability of X taking very large values by stating that Theorem 2. (Markov inequality) The probability of a strictly positive random variable X exceeding γ 0 is bounded by Second order bounds can be derived by the Chebyshev inequality.The latter provides tighter bounds while lifting the positivity assumption.Theorem 3. (Chebyshev inequality) The probability of an arbitrary random variable X exceeding its expected value by a certain fraction γ 0 of its standard deviation as The Chebyshev inequality is generic enough to be applied in a number of scenaria, including those in the present article.Still, it should be noted that other techniques may provide sharper bounds in certain cases.For instance, when X is normally distributed, then specialized methods exist for evaluating the integral under its tail.
Estimating the variance of a transformed random variable can be done through the delta method.
Theorem 4. (Delta method) Let X be a random variable whose expected value E [X] and variance Var [X] are known.For an analytic g(x), the variance of g(X) can be estimated as Proof.The first order approximation of the Taylor expansion of g( Taking the variance of both sides along with the identities, yields the stated result. Corollary 2. For g(x) = 1 x , the delta method yields The Markov and Chebyshev are but two of the probabilistic inequalities collectively known as concentration inequalities, the latter bound the deviation of a random variable or a sequence of random variables from a known value.Other such inequalities include the Talagrand, the Efron-Stein, and the Dvoretzky-Kiefer-Wolfowitz inequalities.

Deterministic Walktrap
The original Walktrap algorithm [9] simulates an edge crossing random walker in order to estimate the stationary distribution of a homogeneous Markov chain.The walker can commence from any vertex and cross edges by randomly selecting destination vertices, systematically moving to vertices with high edge density as vertices are selected with probability proportional to their degree.Since vertices can be visited an arbitrary number of times, unlike algorithms like BFS and DFS, eventually some patterns in the vertex visiting sequence will emerge.As a community is from a structural perspective, essentially a locally dense graph segment, the walker is more likely to move along vertices belonging to the same community for a large time interval before moving to another community.Thus, analysis of the vertex sequence generated by the random walker can reveal the underlying graph community structure.The Walktrap algorithm is outlined in Algorithm 1.

Algorithm 1: Deterministic Walktrap
Require: graph G(V, E), termination criterion τ 0 Ensure: vertex pair sequence s k , s * k is generated 1: pick a random vertex v 2: repeat 3: pick a neighboring vertex v * with probability proportional to its degree as in (23) 4: store current vertex in v and move the walker to the new vertex v * and s k+1 , The degree of any neighboring vertex can be determined by the graph adjacency matrix A defined as Specifically, the degree of v k is the sum of the k-th column of A. The probability that from vertex v p a neighbor v q is selected at the next step is directly proportional to In contrast to many graph algorithms, each vertex may be visited more than once.In fact, vertices must be visited many times in order for meaningful patterns to emerge regarding community structure.Typically, for deterministic graphs, a constant number of visits per vertex may suffice resulting in a total of O (|V|) visits, though techniques exploiting the self-similarity nature of large, scale free graphs may yield somewhat lower bounds of O log 1+ |V| , > 0. For fuzzy graphs, the linear bound is as of yet unknown as to whether it can be improved.
In a distributed setting such as Hadoop, the Deterministic Walktrap can be scaled up since graph segments can be distributed to the nodes.The map part will be the parallel random walkers crossing edges.If such a walker must cross a segment, it can either bounce back or be transferred to the appropriate node.The reduce part will be the frequency count of a large number of vertex pairs.
Once the random walker has finished crossing the graph, the communities are discovered by means of hierarchical clustering using the frequency of pairs (s, s * ) as weights.It should be noted that other methods such as Hidden Markov Models and text mining techniques dealing with missing values [54] may be used for discerning community patterns in the sequence s k , s * k .

Fuzzy Walktrap
The Fuzzy Walktrap algorithm has been proposed and analyzed in [10].Similarly to Algorithm 2, given a vertex v p , each neighbor v q is a candidate for being visited by the Random Walker with probability proportional to its probability of belonging to the fuzzy graph, namely proportional to where the fuzzy adjacency matrix A F defined as Fuzzy Walktrap is outlined in Algorithm 2.

Algorithm 2: Fuzzy Walktrap
Require: fuzzy graph G V, Ẽ, h , termination criterion τ 0 Ensure: vertex pair sequence s k , s * k is generated 1: pick a random vertex v 2: repeat 3: pick a neighboring vertex v * with probability proportional to its cost as in (24) 4: store current vertex in v and move the walker to the new vertex v * and s k+1 , s * k+1 ← (v, v * ) 5: until τ 0 is true 6: return s k , s * k

Markov Walktrap and Chebyshev Walktrap
Both the Markov Walktrap, proposed in [10], and Chebyshev Walktrap, introduced in this article, algorithms improve Fuzzy Walktrap in two ways.The first is that during the walking phase the Random Walker has two optional safeguards against being confined inside a community for too long.Both of these safeguards are common for Markov Walktrap and Chebyshev Walktrap.The second improvement is that, during the clustering phase, two communities may not merge if the path lengths within the resulting community exceed a certain threshold.The latter is based on first order statistics for the Markov Walktrap and on second order statistics for the Chebyshev Walktrap.
Regarding the control of community merge, for community V k , the mean path cost π k is the sum of the individual edge costs and, therefore, is also a random variable.By linearity of expectation, Moreover, as δ e j are independent, In the general case, the distribution of δ e j is unknown, for specific choices of h, it can be computed or estimated.Alternatively, since π k is a sum of random variables, its distribution may be known for certain special cases.For instance, the sum of independent Poisson random variable is another Poisson random variable.In addition, the sum of independent binomial random variables is also a binomial random variable.Finally, the sum of a large number of independent random variables with finite variance is a normal random variable according to the Central Limit Theorem.Nonetheless, in this article, no such assumptions were made and the expected value and the variance of π k were approximated by the Jensen inequality and the delta method, respectively, in Equations ( 13) and (18).
Since π k is by construction positive, the Markov inequality can be applied.Therefore, or, equivalently, If, for a threshold α 0 ∈ (0, 1) π k exceeds α 0 γ 0 , then that community is excluded from merging for an iteration provided it has at least ξ 0 vertices.This is a first order probabilistic safeguard preventing almost formed communities from losing their coherence.
On similar grounds, a second order such safeguard can be built on the Chebyshev inequality

Escape Strategies
Although the purpose of the Random Walker is to discover communities by repeatedly visiting neighboring vertices and crossing low cost edges, it is possible to be trapped inside a community if the latter is connected only through very high cost edges from the remaining graph.To this end, the Random Walker has the option as in [15] to reverse its strategy and select neighboring vertices with probability inversely proportional to their probability of existence if a random flag is triggered.The latter was implemented as a Bernoulli random variable with success probability q 0 , which can be set to zero if so desired in order to disable the weight inversion strategy.Recommended values are typically O |V| − , ≥ 2. Therefore, as in [15], when weight inversion is enabled, the distance d between communities implicitly depends on terms of the form instead of terms of the form Alternatively, a probabilistically triggered restart of the Random Walker akin to the PageRank teleportation [28], the random mutation operator in a genetic algorithm [55,56], or the restart strategy in GMRES iterative solver for linear systems [57,58] was also considered.The relocation probability q 1 has a Bernoulli distribution and is evaluated independently at each step.The number of steps before such a relocation takes place is finite however small q 1 may be as long as it remains strictly positive.The number of steps N to the first relocation has a geometric distribution with success probability equal to q 1 with mass distribution function Therefore, the expected value and variance of N, respectively, are Even though the relocation modification clearly violates the inherent locality of the Walktrap family of heuristics, if properly calibrated, it happens infrequently enough so as not to severely degrade time performance.Moreover, in a distributed system, a simple move of the random walker to the appropriate graph segment suffices and its cost is certainly affordable.Moreover, since relocation is a rare event, the total number of relocations can be modeled by a Poisson distribution.

Data
In order to experimentally evaluate the performance of Markov Walktrap, a Kronecker synthetic graph [59][60][61] has been created.Kronecker graphs are recursively constructed from an original generator graph with the following model where A 0 is the generator graph and ⊗ denotes the Kronecker tensor product.
The generator matrix was which has p = 7 vertices numbered from 0 to 6 and each v k is connected to v k 1 , v k 2 , and v k 3 , where The generator graph is shown in Figure 1.The Kronecker model of Equation (36) has been executed six times aiming to obtain a large graph Y whose properties are summarized in Table 2.
Definitions 5 and 6 outline four important structural graph metrics.
Definition 5.The (log)density of a fuzzy graph is the ratio of the (logarithm of the) number of its edges to the (logarithm of the) number of its vertices.
Definition 6.The (log)completeness of a fuzzy graph is the ratio of the (logarithm of the) number of its edges to the (logarithm of the) number of the edges of the complete graph with the same number of vertices.Observe that Y is highly connected as it has a low diameter of a relatively low cost, high average degree, and a high number of triangles and squares.

Time and Memory Requirements
In Table 3, the total execution time for Chebyshev Walktrap (CW), Markov Walktrap (MW), Fuzzy Walktrap (FW), and Fuzzy Newman-Girvan (FN-G) is shown.The last two algorithms are outlined in [10], whereas Fuzzy Markov was proposed in [15].The effect of the escape mechanisms of weight inversion (I) and relocation (R) are also shown for Markov Walktrap and Chebyshev Walktrap.The Fuzzy Newman-Girvan is an exhaustive algorithm that will serve as baseline both for the requirements and the clustering quality.For the Walktrap algorithms, the time for the two phases, namely random walking (RW) and community building (CB), are recorded separately, while for the Fuzzy Newman-Girvan case only, the total time is recorded as there is only a single phase.In addition, the last column of Table 3 lists the number of the vertices visited by the random walker.Fuzzy Newman-Girvan is considerably slower than any member of the Walktrap family of algorithms.This can be attributed to the exhaustive nature of Fuzzy Newman-Girvan as well as to the extensive use of locality by the Walktrap family.Moreover, the probabilistic constraints of Markov Walktrap and Fuzzy Walktrap resulted in the acceleration of both phases of the respective algorithms, with the second order constraints yielding the lowest times in each case.Concerning the escape strategy of the random walker, the relocation option resulted in a slower walking phase but in an accelerated community building phase, with that combination being more efficient than both weight inversion and the combination of the two escape strategies.Omitting an escape strategy is not advisable.Therefore, it is not recommended to activate both escape strategies at the same time.At any rate, the Chebyshev Walktrap with relocation (CW+R) had the best overall performance tagged along in a close manner by the Markov Walktrap with relocation (MW+R).The original Fuzzy Markov being the tardiest member of the family.
An explanation for the time achieved under the relocation strategy is that the teleportation of the random walker results in cache misses, which translates to expensive fetch cycles in the memory hierarchy system.This can be seen in the last two columns of Table 3, as there is not a clear correspondence between the number of total visits and the total walking phase time.When relocation is enabled, the mean visit time is clearly higher.At any rate, the number of visits is linear in the vertex set cardinality.
In addition, the selection of h did not appear to have a significant performance impact, although in most cases the random walker was slower when h was a Poisson random variable both in terms of time and in terms of total visits.This can be attributed to the large number of high cost edges which forced the walker to bounce more times inside a community before eventually moving to another.On the other hand, the symmetric form of the binomial distribution mass function resulted in a larger number of low cost edges, facilitating the movement of the random walker and making the communities easily separable compared to the Poisson case.
The memory requirements were monitored with the Ubuntu Watch administrative tool as presented in Table 4.In contrast to other similar tools such as htop, Watch generates a text output which can be parsed and analyzed.It was periodically ran every 10 s through a bash script resulting in records of several thousand of entries each.selected, then memory utilization has certain spikes, as it can be inferred from the increased maximum memory occupied and the increased standard deviation.This is a direct result of the random walker teleportation which temporarily annuls any scheduling optimization as well as any caching done at the software or hardware level.

Community Coherence
The following definition will facilitate further analysis of the experimental results.Definition 7. The (log)scree plot of a set S is the plot of the (logarithm of the) values of S versus their sorted frequency.
Since Y does not contain ground truth communities, the communities obtained by the Fuzzy Newman-Girvan will be used as a baseline reference since their sizes are closer to a power law distribution, which is an essential of large, scale-free graphs.The deviation ξ of a set of numbers {x} n k=1 from a power law is quantified by the formula [62,63] where parameters α 0 and γ 0 can be estimated by, for instance, a least squares method [25].Additionally, the estimated value of α 0 serves as a quality indicator, as it should be as close to [2,3] as possible.
The number of communities for each algorithm are shown in Table 5.Notice that this is not an absolute clustering quality metric, as typically a large number of coherent communities is preferable to a smaller number of sparse ones.Nonetheless, the introduction of the relocation strategy systematically pushes the number of communities towards the reference number, although more evidence is required for determining community coherence.This will be addressed by the two asymmetric indices of this section.In order to evaluate the clustering quality, the Kullback-Leibler divergence between the sorted sizes of the communities generated by the Fuzzy Newman-Girvan and the sorted community sizes of the remaining algorithms was computed.Recall that for two discrete distributions p k and q k the Kullback-Leibler divergence is defined as where k ranges over the union of discrete events.If p k and q k have no events in common, then the result is undefined.If for a single event p k = 0 or q k = 0, then the corresponding summand is zero.Table 6 summarizes the divergence for the Poisson and the binomial cases.Chebyshev Walktrap with relocation outperforms the remaining algorithms, as it has divergence from the reference distribution.
A question at this point is whether a correspondence between the communities returned by each algorithm can be found.The asymmetric Tversky index between two sets T and V is defined as and it quantifies the distance between the template set T and the variant set V. By the very definition of the index, the template set T and the variant set V are not interchangeable, namely ν T,V = ν V,T .This agrees with intuition, as it makes sense to ask how much the heuristic results differ from the ground truth community, whereas there is no point in asking the inverse question.On the contrary, with a symmetric distance metric such as, for instance, the Tanimoto similarity coefficient no distinction can be made between the template and the variant, which can potentially lead to misleading results.
At this point, it should be highlighted that Fuzzy Newman Girvan was executed only once since it is a deterministic algorithm.
Returning to Label (44), the case w 1 + w 2 = 1 is of particular interest in data mining, as it confines the coefficients on the plane which maximizes the minimum distance of T from V. Notice that algebraically this asymmetry stems from both the terms |T \ V| and |V \ T|, which denote the number of elements of T not found in V and vice versa.Both terms signify in their own way how V is different from T. The former corresponds to the part of V which is missing from T, whereas the latter corresponds to any additions to V. As a rule, |T \ V| is more important and, consequently, w 1 > w 2 .As there is no standard rule for selecting w 1 and w 2 , the following two schemes have been used, a linear and an exponential Observe that in the first case w 1 w 2 = s, while in the second w 1 w 2 = e s , which clearly represents a non-linear scaling of the first case.Furthermore, the second case is considerably biased in favor of |T \ V|.
Once for each possible pair of the m ground truth communities T i 1 ≤ i ≤ m and the n estimated ones V j 1 ≤ j ≤ n the mn Tversky indices have been computed, the similarity score J(s) for a given s is computed Again, Chebyshev Walktrap with relocation outperforms the remaining algorithms as it has the highest similarity with the reference communities.Note that the exponential weighting scheme sharpens the difference between the algorithms by raising the maximum scores and lowering the minimum ones.
For the experiments of the section, the termination criterion τ 0 was chosen to be a user supplied number of iterations, namely |V| log |V|.This number of iterations is sufficiently large for generating communities in a reliable way.Moreover, each iteration is very quick, so the overall execution time was kept at an acceptable level despite the large number of iterations.

Relocations
Analysis is concluded with a summary regarding the relocations made by the Chebyshev Walktrap and the Markov Walktrap.
In Table 8, certain statistics regarding the random walker relocations are shown.Specifically, the first line presents the total number of relocations, whereas the second line shows the number of steps that the random walker makes before being relocated for the first time.Similarly, the last three lines contain the minimum, maximum and average number of steps between two successive relocations, respectively.

Conclusions
The primary contribution of this article is the implementation over Neo4j of Chebyshev Walktrap, a community discovery algorithm designed for edge-fuzzy graphs, a class of fuzzy graphs used among others in [10], which is based on the Random Walker algorithmic principle.Additionally, Chebyshev Walktrap relies on the competitive factors of second order statistics though the Chebyshev inequality and on an optional relocation capability in order to bound unnecessarily costly walks and, thus, remaining inside a community and being trapped for too long within the boundaries of a community, respectively.The relocation aspect was also backported to the Markov Walktrap algorithm first proposed in [15].The effect of relocation on the community coherence was evaluated based on the asymmetric Tversky index using the Fuzzy Newman-Girvan algorithm from [15] as baseline, while its effect on the output distribution was assessed with the asymmetric Kullback-Leibler divergence.The latter was also the basis for evaluating the distance between the community size distribution generated by Fuzzy Newman-Girvan and the one computed by the Makrov Walktrap and the Chebyshev Walktrap.In these cases, the introduction of asymmetry resulted in the clear distinction between the baseline data and their variants.
The test dataset was a large synthetic Kronecker graph whose edge fuzziness was controlled either by a binomial or by a Poisson distribution.In this dataset, our performance metrics showed that Chebyshev Walktrap yields more compact communities whose sizes are more clustered.Additionally, Markov Walktrap is, in many instances, slightly faster at the expense of a somewhat bigger memory footprint.
The experimental results of Section 5 hint at some future research directions.More sophisticated from a probabilistic viewpoint, community discovery algorithms should be able to exploit the asymmetry of the edge fuzziness distribution through higher order concentration inequalities such as the Talagrand inequality, provided their computation is efficient.Moreover, new metrics for community matching, perhaps utilizing functional or semantic information should be developed.Additionally, methodologies for reliably assessing community coherence based on higher order structural or functional interactions should be sought.Finally, more experiments in larger graphs should be conducted in order to determine any inherent scalability limitations.

Table 2 .
Structural properties of graph Y.

Table 3 .
Performance in terms of time (sec) and vertex visits.

Table 5 .
Number of communities.