Connectivity of Random Geometric Hypergraphs

We consider a random geometric hypergraph model based on an underlying bipartite graph. Nodes and hyperedges are sampled uniformly in a domain, and a node is assigned to those hyperedges that lie within a certain radius. From a modelling perspective, we explain how the model captures higher-order connections that arise in real data sets. Our main contribution is to study the connectivity properties of the model. In an asymptotic limit where the number of nodes and hyperedges grow in tandem, we give a condition on the radius that guarantees connectivity.


Motivation
There is growing interest in the development of models and algorithms that capture group-level interactions [5,6,28].For example, multiple co-authors may be involved in a collaboration, multiple workers may share an office space, and multiple proteins may contribute in a cellular process.In such cases representing the connectivity via a network of pairwise interactions is an obvious, and often avoidable, simplification.Hypergraphs, where any number of nodes may be grouped together to form a hyperedge, form a natural generalization.Hypergraph-based techniques have been developed for • studying the propagation of disease or information [1,4,7,18,19,20], • investigating the importance or structural roles of individual components [12,30,29], • discovering and quantifying clusters [8,27,13], • predicting future connections [33,32], • inferring connectivity structure from time series data [21].
Just as in the pairwise setting, it is also of interest to consider processes that create hypergraphs [3,15].Comparing generative hypergraph models against real data sets may help us to understand the mechanisms through which interactions arise.Furthermore, realistic models can be used to produce synthetic data sets on which to base simulations, and also to form null models for studying features of interest.
Models that use a geometric construction, with connectivity between elements determined by distance, have proved useful in many settings.Random geometric graphs were first introduced in [14] to model communication between radio stations, although the author also mentioned their relevance to the spread of disease.These models have subsequently proved useful in many application areas, ranging from studies of the proteome [16,26,17] to academic citations [31].In many settings, the notion of distance may relate to an embedding of nodes into a latent space that captures key features.Here, similarity is interpreted in an indirect or abstract sense.Random geometric graphs have also been studied theoretically, with many interesting results arising from the perspectives of analysis, probability and statistical physics [22,24,10,2,25,9] Our aim in this work is motivate and analyse a random geometric hypergraph model.In a similar manner to [3], we make use of the connection between hypergraphs and bipartite graphs.The model is introduced and motivated in section 2, where we also show the results of illustrative computational experiments concerning connectivity.Our main contribution is to derive a condition on the thresholding radius that asymptotically guarantees connectivity of the hypergraph.The result is stated and proved in section 3. Directions for future work are described in section 4.

The Random Geometric Model and Its Connectivity
In this section we motivate and informally describe a geometric random hypergraph model, and computationally investigate its connectivity.We make use of a well known equivalence between hypergraphs and bipartite graphs [3,11].Suppose we are given an undirected bipartite graph, where nodes have been separated into two groups, A and B. By construction any edge must join one node in group A with one node in group B. We may form a hypergraph on the nodes in group A with the following rule: • nodes in group A appear in the same hyperedge if and only if, in the underlying bipartite graph, they both have an edge to the same node in group B.
In this way the nodes in set B may be viewed as hyperedge "centres."Two nodes from group A that are attracted to the same centre are allocated to the same hyperedge.In many graph settings there is a natural concept of distance between nodes.For example, in social networks, geographical distance between places of work or residence may play a strong role in determining connectivity.More generally, there may be a more nuanced set of features (hobbies, tastes in music, pet ownership, . . . ) that help to explain whether pairwise relationships arise.This argument extends readily to the bipartite/hypergraph scenario.Hyperedge centres may correspond, for example, to shops, office buildings, gyms, train stations, restaurants, concert venues, churches,. . ., with an individual joining a hyperedge if they are sufficiently close to that centre; for example, exercising at a local gym.In the absence of specific information, it is natural to assume that the features possessed by a node arise at random, so that a node is randomly embedded in R d for some dimension d.In a similar way, we may simultaneously embed our hyperedge centres in R d , and assign a node to a hyperedge if and only if it is within some threshold distance of the centre.Figure 1 illustrates the idea in the two dimensional case.We have a bipartite graph with two types of nodes.Groups A and B are represented by circles and stars, respectively.We form a hypergraph by placing a circle node in a hyperedge if and only if it is within a certain distance of the corresponding star.Colours in the figure distinguish between the different hyperedges.We emphasize that mathematically the resulting hypergraph consists only of the list of hypergraph nodes and hyperedges.Information about the existence/number of hyperedge centres and the locations of all nodes in R 2 is lost.
Our aim in this work is to study connectivity: a basic property that is of practical importance in many areas, including disease propagation, communication and percolation.We consider the random geometric hypergraph to be connected if the underlying random geometric bipartite graph is connected.We focus on the smallest distance threshold that produces a connected network and study an asymptotic limit where the number of nodes tends to infinity.
We motivate our analytical results with computational experiments.To produce Figure 2, we formed random geometric bipartite graphs based on n points embedded in R 2 .For each graph, the points had components chosen uniformly and independently in (0, 1).We separated these points into two groups of size n 1 = 0.8n and n 2 = 0.2n.We then used a bisection algorithm to compute the smallest radius r that produced a connected bipartite graph.In other words, we found the smallest r such that a connected graph arose when we created edges between pairs of nodes from different groups that were separated by Euclidean distance less than r.(Equivalently, we assigned n 1 = 0.8n points to the role of nodes in a random geometric hypergraph and n 2 = 0.2n points to the role of hyperedge centres, and computed the smallest node-hyperedge centre radius that gave connectivity.)We ran the experiment for a range of n values between 10 3 and 10 4 .For each choice of n we repeated the computed for 500 independent random node embeddings.Figure 2 shows the mean, maximum, and minimum radius arising for each n.Note that the axes are scaled logarithmically.We have superimposed a reference line of the form Cn − 1 2 , which is seen to be consistent with the behaviour of the radius.
Figures 3 and 4 repeat these computations with points embedded into R 4 and R 10 , respectively.We see that the behaviour remains remains consistent with a decay roughly proportional to, and perhaps slightly slower than, n −1/d for dimension d.
In the next section we formalize our definition of a random geometric hypergraph and establish an upper bound on the radius decay rate for connectivity that agrees with n −1/d , up to log-dependent factors (which of course would be extremely difficult to pin down in computational experiments).We also note for comparison that a threshold of the form (log(n)/n) 1/d has previously arisen in the study of random geometric graphs, [2,23].    .
In related work, we note that Barthelemy [3] proposed and studied a wide class of random hypergraph models, including examples where nodes are embedded in space and connections arise via a distance measure.That approach to defining a random geometric hypergraph differs from ours by assuming that the number of hyperedges is given and by considering a process where new nodes are added to the network, with new connections arising based in the current hyperedge memberships ([3, Figure 6]).

Connectivity Analysis
We now give a formal definition of a random geometric hypergraph and show that under reasonable conditions a thresholding radius of order (log(n)/n) 1/d ensures connectivity, asymptotically.
Let D be a bounded Euclidean domain in R d , such that D has Lipschitz boundary.Given n ∈ N, we let P n be a Poisson point process sampled from D with respect to some continuous and bounded distribution f , such that f > 0 everywhere on D. We use |•| to denote the Euclidean norm.Let n ∈ N, and let n 1 be the expeted number of nodes and n 2 be the expected number of hyperedges, chosen such that n = n 1 + n 2 .Let r n be a function of n, tending to 0 as n → ∞.Definition 1.Let G(n 1 , n 2 , r n ) be the probability space on the set of geometric hypergraphs, where the random nodes are chosen as a Poisson point process P n 1 in D sampled with respect to f , the random hyperedges are induced by another Poisson point process P n 2 in D sampled with respect to f , and where, using bipartite graph-hypergraph equivalance, a node x ∈ P n 1 and a hyperedge y ∈ P n 2 are connected by an edge if |x − y| < r n .
Suppose that the expected number of nodes n 1 and of hyperedges Equivalently, this means that n 1 and n 2 as functions of n satisfy Let K > 0 be the smallest constant such that for all n ∈ N, Partition R d into a grid of cubes {C i,n } i of width γr n , where r n = o(1) and γ > 0 is to be determined.Let S n := {i | C i,n ⊂ D}, and for each i ∈ S n , let I(i, n) := {j ̸ ∈ S n | C j,n is adjacent to C i,n }, and let Since D has a Lipschitz boundary, by compactness there exists C > 0 depending on D and d (but not on γ), such that we can choose n 0 ∈ N sufficiently large such that for all n ≥ n 0 and all i ∈ S n We then choose γ := 1 C , so that for all i ∈ S n , Note also that we have and suppose that r n satisfies where w(n) → ∞ arbitrarily slowly as n → ∞.With probability tending to 1 as n → ∞, for all i ∈ S n Proof.It suffices to show that the RHS in tends to 0 as n → ∞.
Since P m is a homogeneous Poisson point process, we have, using , for all i ∈ S n and by the pigeonhole principle, Note that with our choice of γ, we have Hence Lemma 3.1 gives us a lower bound estimate on the decay of r n as a function of n, to ensure that the balls centered at the points of P m and of radius r n , tend to form a covering of the domain D as n → ∞.This is an asymptotic result.From a practical point of view, it is more useful to have a nonasymptotic version of Lemma 3.1, even if we must increase slightly the constraint on the decay of r n .This is the object of Lemma 3.2.Lemma 3.2 (Non-asymptotic coverage).Suppose that m as a function of n satisfies, for all n ∈ N, and suppose this time that r n satisfies for some fixed ϵ > 0, then a.s., there exists N ∈ N such that for all n ≥ N and all i ∈ S n P m ∩ Q i,n ̸ = ∅.
Proof.A proof proceeds similarly to that of Lemma 3.1, but the different constraint on r n yields instead and the required result then follows by the Borel-Cantelli lemma, since then, the series We believe that the lower bound condition on the decay of r n found in Lemma 3.1 is sharp, and that the lower bound condition in Lemma 3.2 is close to being sharp.In Theorem 3.3 we apply Lemmas 3.1 and 3.2 to obtain a sufficient lower bound condition on r n for the connectivity of random geometric hypergraphs, with an extra factor of 2. We suspect that this factor could be reduced by more sophisticated analysis.Theorem 3.3.For every n ∈ N, let (n 1 , n 2 ) ∈ N 2 satisfy (1) and n = n 1 + n 2 .
• If r n satisfies (3), then with probability tending to 1 as n → ∞, the random graph G(n 1 , n 2 , 2r n ) is connected.
• If r n satisfies (4), then a.s.there exists N ∈ N, such that for all n ≥ N , the random geometric bipartite graph G(n 1 , n 2 , 2r n ) is connected.
Proof.The result is a consequence of Lemmas 3.1 and 3.2 and the triangle inequality.
Suppose that n ∈ N is such that for all i ∈ S n Given two points x, y ∈ P n 1 , we can find a path of adjacent cubes from Q n , such that the first cube contains x and the last cube contains y.From (2) and the triangle inequality, the distance between a point in one cube and another point in an adjacent cube is at most 2r n .Since for each cube in the path we can find a point from P n 1 and a point from P n 2 , we can then form a path of edges of length at most 2r n from x to y, alternating between points in P n 1 and points in P n 2 , and such a path is then a path in G(n 1 , n 2 , 2r n ).This shows connectivity of the graph for all n satisfying condition (5).
This condition is true with probability tending to 1 as n → ∞, if we assume that r n satisfies (3), using Lemma 3.1 with n 1 and n 2 instead of m, giving us the first part of the theorem.
Using Lemma 3.2 with n 1 and n 2 instead of m, if r n satisfies (4), there exists N ∈ N such that ( 5) is true for all n ≥ N , giving us the second part of the theorem.

Discussion
There are a number of promising avenues for further work in this area.From a theoretical perspective, it would be of interest to derive useful upper bounds or indeed sharp expressions for the exact connectivity radius threshold associated with this class of random geometric hypergraphs.More general hypergraph models could also be developed and studied, for example using a softer version of the distance cut-off that has been considered in the graph setting [9,25].
From a more practical viewpoint, the related inverse problem is both challenging and potentially useful: given a data set that corresponds to a hypergraph, for the model considered here what is the best choice of (a) embedding dimension, (b) node locations, and (c) hypergraph centre locations?A similar question was addressed in [15] for a different generative random hypergraph model based on the assumption that nodes are located in a latent space and hyperedges arise preferentially between nearby nodes (without the concept of hyperedge centres).This challenge also leads into the model selection question: given a data set and selection of hypergraph models, which one best describes the data, and what insights arise?

Figure 1 :
Figure 1: When this construction is regarded as a bipartite graph, the solid circles and solid stars represent two types of node.Edges are created only between nodes of a different type; this happens if any only if they are close enough in Euclidean distance.When regarded as a hypergraph, the solid circles represent nodes and the solid stars represent "centres" of hyperedges.A node is a member of a hyperedge if and only if it is sufficiently close to the corresponding centre.Mathematically, the resulting hypergraph may defined by labelling the nodes {1, 2, 3, 4, 5, 6, 7} and listing the hyperedges as {1}, {2, 3}, {4, 5, 6, 7} and {5, 6, 7}.

Figure 2 :
Figure 2: Euclidean distance at which geometric random hypergraph becomes connected.Here we have 0.8n nodes and 0.2n hyperedge centres in R 2 , for values of n between 10 3 and 10 4 .The plots show the mean, maximum and minimum value of this distance over 500 independent trials.A reference slope corresponding to Cn − 1 2 is shown.Axes are logarithmically scaled.

Figure 3 :
Figure 3: As for Figure 2 with nodes embedded in R 4 and a reference slope corresponding to Cn − 1 4 .

Figure 4 :
Figure 4: As for Figure 2 with nodes embedded in R 10 and a reference slope corresponding to Cn − 1 10 .