Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression

Abaa, Innocent; Islambekov, Umar

doi:10.3390/ijt2040017

Open AccessArticle

Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression

by

Innocent Abaa

and

Umar Islambekov

^*

Department of Mathematics and Statistics, Bowling Green State University, Bowling Green, OH 43403, USA

^*

Author to whom correspondence should be addressed.

Int. J. Topol. 2025, 2(4), 17; https://doi.org/10.3390/ijt2040017

Submission received: 12 August 2025 / Revised: 8 September 2025 / Accepted: 24 September 2025 / Published: 14 October 2025

(This article belongs to the Special Issue Feature Papers in Topology and Its Applications)

Download

Browse Figures

Versions Notes

Abstract

We propose a nested ensemble learning framework that utilizes Topological Data Analysis (TDA) to extract and integrate topological features from graph data, with the goal of improving performance on classification and regression tasks. Our approach computes persistence diagrams (PDs) using lower-star filtrations induced by three filter functions: closeness, betweenness, and degree 2 centrality. To overcome the limitation of relying on a single filter, these PDs are integrated through a data-driven, three-level architecture. At Level-0, diverse base models are independently trained on the topological features extracted for each filter function. At Level-1, a meta-learner combines the predictions of these base models for each filter to form filter-specific ensembles. Finally, at Level-2, a meta-learner integrates the outputs of these filter-specific ensembles to produce the final prediction. We evaluate our method on both simulated and real-world graph datasets. Experimental results demonstrate that our framework consistently outperforms base models and standard stacking methods, achieving higher classification accuracy and lower regression error. It also surpasses existing state-of-the-art approaches, ranking among the top three models across all benchmarks.

Keywords:

topological data analysis; persistent homology; nested ensemble; persistence diagram

1. Introduction

Graphs serve as natural representations of complex systems in a wide range of domains, including social networks, biological systems, and infrastructure networks. With the increasing prevalence of graph-structured data, there is a growing demand for models that can effectively capture both local and global structural patterns to support predictive tasks such as classification and regression. Ensemble learning has emerged as a particularly powerful strategy in this context. By combining multiple base models, ensemble methods enhance predictive accuracy, reduce overfitting, and improve generalization. Among these techniques, stacked generalization (or stacking) [1] is notable for its ability to integrate diverse base learners through a higher-level meta-learner. This architecture allows the model to leverage complementary patterns learned by each base learner, making it well-suited for heterogeneous and structurally complex datasets such as graphs. The theoretical advantage of stacking lies in its ability to balance the bias–variance trade-off, which is particularly crucial in domains characterized by irregular, non-Euclidean structures.

A key avenue for enhancing stacking-based approaches on graph data is the incorporation of domain-specific structural features that reflect the underlying topology of the graph. In this regard, Topological Data Analysis (TDA) provides a principled, mathematically grounded framework for these purposes. Rooted in algebraic topology and computational geometry, TDA offers tools to study the shape structure underlying data [2]. The most widely used tool in TDA is persistent homology, which tracks how topological features (such as connected components, loops, and voids) appear and disappear as a graph is analyzed at different scales. This is achieved by building a sequence of nested topological spaces, known as a filtration, on top of the data and recording the birth and death of topological features as the filtration progresses. The evolution of these features is summarized in a persistence diagram, where each point represents a feature, with its coordinates indicating the scale at which it was “born” and “died.” This provides a compact summary of the data’s shape across multiple scales. Unlike traditional graph-theoretic metrics, which are limited to local connections, persistent homology captures both local and global patterns. This makes it a robust method for analyzing graph structure, as it is less sensitive to noise and small changes in the data.

A central step in applying TDA is choosing an appropriate filtration [2], a nested sequence of simplicial complexes (collections of vertices, edges, and higher-dimensional geometric shapes called simplices) used to track topological features. Commonly used filtrations, such as the Vietoris–Rips and Čech, are designed for point clouds in a metric space, where simplices are added based on pairwise distances. However, this approach is problematic for graph-structured data for several reasons. While it is possible to embed a graph into a metric space [3,4], doing so often leads to a potential loss of crucial node connectivity information. Furthermore, applying metric-based filtrations to the resulting point cloud becomes computationally expensive for large graphs, as the Vietoris–Rips complex can grow exponentially with the number of nodes. For graph data, an effective alternative is the lower-star filtration [5], which constructs a valid filtration using a scalar-valued function defined on the graph’s nodes or edges. This method circumvents the need for an explicit metric embedding by leveraging node-level attributes like degree, centrality, or spectral features to dictate the order in which simplices are added to the filtration. A simplex enters the filtration at the maximum function value of its vertices, satisfying a sublevel set condition that ensures the filtration is nested and mathematically well-defined for persistent homology. By utilizing node-level functions rather than pairwise distances, the lower-star filtration respects the graph’s intrinsic structure and seamlessly integrates both topological and attribute-based information, avoiding the computational and information-loss issues associated with metric embeddings [6].

While the lower-star filtration offers a flexible and mathematically valid framework for analyzing graph-structured data based on node or edge attributes, a significant challenge lies in selecting an appropriate scalar function to assign filtration values, as its choice critically influences the resulting topological summaries. Common options include node attributes, such as degree [7], spectral embeddings [8], or domain-specific features [9,10], which critically influence the resulting topological summaries. A poorly chosen function can obscure relevant topological signals or introduce spurious features, potentially degrading model performance. Since no single function is universally optimal, relying on a fixed, manually selected filter, which is common in many existing TDA approaches [7,8,9,10], can restrict a model’s flexibility and expressive power, particularly when that function fails to highlight the topological patterns most relevant to the learning task.

To move beyond the limitations of single, hand-crafted filters, recent research has explored several advanced strategies. One approach is the use of multi-filter filtrations, where diverse filtrations are constructed from multiple structural perspectives, and their persistence diagrams are vectorized and concatenated into a single feature vector for downstream tasks [11,12]. A complementary, more sophisticated direction involves learning the filter functions directly from data, embedding their design into the learning process so that filtrations are adaptively determined rather than fixed in advance. For example, frameworks for point clouds adaptively learn filtrations via neural networks [13] or through topological autoencoders that use a persistent homology loss to preserve structure [14]. For graphs, frameworks like TOGL extend this idea by learning topology-aware representations end-to-end within a graph neural network, removing the reliance on fixed descriptors [15]. A third, complementary but more complex strategy is multiparameter persistent homology (MPH), which applies filtrations across multiple independent parameters simultaneously [9,10,16]. While MPH captures a richer mathematical object that describes the interactions between different filtration parameters, it also introduces significant computational and interpretational challenges. Despite recent efforts to create vectorized summaries that make MPH more accessible [17], these trade-offs highlight the continued relevance of alternative multi-filter strategies.

Building on these advances, we propose a model-based, data-driven ensemble learning framework that integrates multiple filter functions, with their contributions adaptively optimized through a principled, two-level meta-learning process. Unlike previous methods that either treat each filter independently or simply concatenate their features, our approach first computes persistence diagrams for multiple filter functions using lower-star filtrations. Each filter function, defined by a specific node attribute, guides the construction of the simplicial complex. These persistence diagrams are then vectorized into distinct topological feature sets, which serve as the foundation for our model building.

The key innovation of our framework lies in how we combine these multiple filter functions in a fully data-driven manner. The process involves a three-level architecture:

Base Model Training (Level-0): For each filter function, a diverse set of base machine learning models is trained independently. This parallel training strategy enables each model to specialize in identifying the unique predictive patterns present within the topological features extracted by its corresponding filter.
Filter-specific Ensemble (Level-1): hlFor each filter function, a first-level meta-learner is trained to optimally combine the predictions of that filter’s diverse base models from Level-0. This process results in a single, refined, stacked ensemble prediction for each filter function.
Cross-filter Ensemble (Level-2): A second-level meta-learner is trained on the outputs from all filter-specific ensembles from Level-1. This meta-learner learns to adaptively assign importance weights to each filter, thereby empirically determining its relative influence on the final nested ensemble prediction.

This hierarchical weighting mechanism, grounded in empirical evidence, directly learns which filters are most informative for the task. It eliminates the need for manual filter selection, as even randomly chosen or unconventional filters can be included and their relevance automatically assessed. The model dynamically prioritizes filters that contribute most to predictive performance while down-weighting less useful ones. By unifying multiple topological perspectives through this automated ensemble structure, our method consistently enhances performance on downstream tasks such as classification and regression.

We validate our approach on both synthetic and real-world data, including several widely used benchmark datasets from the TUDataset collection [18]: IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and PROTEINS. The results demonstrate that our proposed nested ensemble framework consistently outperforms both the individual base models and standard stacking methods across classification and regression tasks. Moreover, topological features derived from persistent homology show superior performance compared to traditional graph-theoretic descriptors, underscoring the benefits of integrating TDA with ensemble learning for graph-based machine learning. On the TUDataset benchmarks, we further demonstrate the effectiveness of our method by comparing it to the top five existing topological approaches. Our preliminary results are promising and indicate that the information extracted by our framework can be effectively utilized for graph classification tasks.

The remainder of this paper is organized as follows. Section 2 introduces the theoretical foundations of persistent homology and describes how we extract topological features from graphs. Section 3 details the proposed nested ensemble learning framework. Section 4 presents our experimental results, and Section 5 provides a summary and discusses directions for future research.

2. Background and Preliminaries

In this section, we introduce the foundational concepts of topological data analysis (TDA), focusing on simplicial complexes, filtrations, homology, and persistent homology, following the notation and conventions in [19,20,21]. These concepts form the theoretical basis for the methodology developed in this study.

2.1. TDA Background Theory

Topological Data Analysis (TDA) provides a mathematical framework for uncovering the intrinsic geometric and topological structures underlying data. A central tool in TDA is persistent homology, which tracks the evolution of topological features (e.g., connected components, loops, and voids) across multiple scales. The resulting output is often summarized in the form of a persistence diagram (PD), a multiset of points (A multiset of points means that different topological features can have the same birth and death times in the persistence diagram.)

(b, d) \in R^{2}

, where each point corresponds to a topological feature that appears (is “born”) at scale b and disappears (or “dies”) at scale d (see Figure 5). These diagrams serve as topological signatures of the data, capturing information such as connected components, loops, and voids across multiple scale values.

To compute a persistence diagram, we must first equip the data (often given as a point cloud, image, or graph) with a topological structure through the construction of a simplicial complex. A simplicial complex provides a combinatorial framework that captures relationships between data points. This is accomplished through various constructions, such as the Vietoris–Rips, Čech, or lower star filtration, which approximate the underlying topological structure of the space from which the data were sampled. This process allows for the systematic study of topological features using tools from algebraic topology, such as homology and persistent homology.

An (abstract) simplicial complex K is a collection of finite subsets of a vertex set V, satisfying the following two properties. First, every vertex

v \in V

must be included as a singleton set

{v} \in K

. Second, the complex must be closed under taking subsets, i.e., if

τ \in K

, then any subset

σ \subseteq τ

is also in K. In this context, any subset

τ \in K

containing

p + 1

vertices is called a p-simplex and has dimension p, denoted

dim (τ) = p

. A subset

σ \subseteq τ

is called a face of

τ

, written

σ ⪯ τ

. The set of all simplices in K of dimension d is denoted by

K_{d}

, and the overall dimension of K is the highest dimension among its simplices. There is a geometric realization of simplicial complexes in a suitable Euclidean space, in which a vertex corresponds to a 0-simplex, an edge to a 1-simplex, a filled triangle to a 2-simplex, and a tetrahedron to a 3-simplex, and so on (see Figure 1); in this realization, different simplices intersect only along their common faces. Example 1 illustrates both the concept of a simplicial complex and the process by which it is constructed.

Example 1.

Consider the simplicial complex X shown in Figure 2, which has the vertex set

V = {a, b, c, d}

and the collection of simplices

K = {{a, b, c}, {a, b}, {a, c}, {b, c}, {b, d}, {c, d}, {a}, {b}, {c}, {d}} .

The 0-simplices, denoted

K_{0} = {{a}, {b}, {c}, {d}}

, are simply the vertices themselves. The 1-simplices,

K_{1} = {{a, b}, {a, c}, {b, c}, {b, d}, {c, d}}

, represent the edges connecting these vertices. Finally, the 2-simplex

K_{2} = {{a, b, c}}

corresponds to the filled triangle formed by vertices a, b, and c.

A key step in TDA involves constructing a filtration—a nested sequence of simplicial complexes that progressively captures the evolving topological features of the data at multiple scales or thresholds. One common approach to building filtrations is through the sublevel sets of a monotone function defined on the simplices of a simplicial complex. Specifically, let K be a simplicial complex, and let

g : K \to R

be a real-valued function defined on its simplices. The function g is said to be monotone if, for every pair of simplices

σ \subseteq τ \in K

, it satisfies

g (σ) \leq g (τ)

. This condition ensures that for each threshold

t \in R

, the sublevel set

K_{t} = {σ \in K ∣ g (σ) \leq t}

forms a valid simplicial subcomplex of K, because if a simplex is in

K_{t}

, its faces are also guaranteed to be in

K_{t}

by the monotonicity property. Let

{c_{1} < c_{2} < \dots < c_{N}}

denote the ordered set of distinct function values taken by g. The sublevel set filtration induced by g is then given by the nested sequence:

\emptyset = K_{1} (g) \subseteq K_{2} (g) \subseteq \dots \subseteq K_{N} (g) = K

(1)

where each

K_{i} (g) = {σ \in K ∣ g (σ) \leq c_{i}}

. The number of filtration steps N corresponds to the number of unique function values attained by g over the simplices in K.

2.1.1. Homology and Persistent Homology

To understand how topological features such as connected components, loops, and voids evolve across a filtration, we need algebraic tools that can precisely capture these structures. This is exactly the role of homology, which formalizes these features using algebraic objects called homology groups.

Given a sequence of nested simplicial complexes—called a filtration—we can apply homology to each complex to systematically identify and track topological features across multiple scales. The technique of persistent homology tracks how these features, such as connected components (

H_{0}

), loops (

H_{1}

), and voids (

H_{2}

), evolve and persist over the course of the filtration. At each stage, a simplicial complex K is constructed from a set of vertices, edges, triangles, and possibly higher-dimensional simplices. To uncover the topological information contained in these complexes, we associate to them algebraic structures called chain groups. For each dimension d, the chain group

C_{d} (K)

is defined as the vector space (or free module) generated by the d-simplices in K, with coefficients chosen in a fixed field (

like Z_{2} or R

). An element of

C_{d} (K)

is called a d-chain. It is a linear combination of d-simplices, written as

c = a_{1} σ_{1} + a_{2} σ_{2} + \dots + a_{m} σ_{m},

(2)

where the

σ_{i}

are d-simplices in K and the

a_{i}

are coefficients from the chosen field. Next, we connect different dimensions using boundary operators

\partial_{d} : C_{d} (K) \to C_{d - 1} (K)

. These operators map each d-simplex to a sum of its

(d - 1)

-dimensional faces. For example, for a simplex

[v_{0}, v_{1}, \dots, v_{d}]

,

\partial_{d} ([v_{0}, v_{1}, \dots, v_{d}]) = \sum_{j = 0}^{d} {(- 1)}^{j} [v_{0}, \dots, {\hat{v}}_{j}, \dots, v_{d}],

(3)

where

{\hat{v}}_{j}

means that the vertex

v_{j}

is omitted. When working over

Z_{2}

, we can ignore the signs and simply track whether faces are present or absent.

A key property of the boundary operator is that applying it twice yields zero, i.e.,

\partial_{d - 1} \circ \partial_{d} = 0 .

We define a d-cycle as a d-chain whose boundary is zero, i.e., it is in the kernel of the boundary operator

\partial_{d}

. This space is denoted by

Z_{d} (K) = \ker (\partial_{d})

. Similarly, a d-boundary is a d-chain that is the boundary of some

(d + 1)

-chain. These form the image of the boundary operator

\partial_{d + 1}

, denoted by

B_{d} (K) = im (\partial_{d + 1})

. Because applying the boundary operator twice yields zero (

\partial_{d - 1} \circ \partial_{d} = 0

), every boundary is automatically a cycle. This means we have the following subgroup relationship:

B_{d} (K) \subseteq Z_{d} (K) .

The d-th homology group is then defined as the quotient:

H_{d} (K) = Z_{d} (K) / B_{d} (K),

(4)

which captures cycles that are not boundaries. The dimension of

H_{d} (K)

, called the d-th Betti number,

β_{d}

, counts the number of independent d-dimensional holes. It is calculated as

β_{d} = dim (Z_{d} (K)) - dim (B_{d} (K))

. For

d \geq 1

,

β_{d}

is interpreted as counting the number of independent loops (

β_{1}

) and higher-dimensional voids (

β_{d}

for

d > 1

). In ordinary homology,

β_{0}

counts the number of connected components. However, in some contexts, particularly for single-point spaces, reduced homology is used, where the zeroth reduced Betti number is one less than the number of connected components [23].

Persistent homology extends homology to filtrations, allowing us to track how topological features appear and disappear across multiple scales. As we build the simplicial complex in the filtration, topological features are both born and destroyed. For example, new connected components emerge (increasing the Betti number

β_{0}

) and can later merge with other components (decreasing

β_{0}

). Similarly, new loops can be created by the addition of edges (increasing

β_{1}

) and are destroyed when a triangle fills them in (decreasing

β_{1}

). This dynamic process is captured in a persistence diagram, which records the birth and death of each topological feature, providing a concise summary of its lifespan across the filtration. Given a filtration

\emptyset = K_{0} \subseteq K_{1} \subseteq \dots \subseteq K_{N} = K,

(5)

the inclusion maps

K_{a} ↪ K_{b}

induce homomorphisms between homology groups:

ι_{d}^{a \to b} : H_{d} (K_{a}) \to H_{d} (K_{b}) .

(6)

Each homology class can be associated with a birth time,

b_{x}

, indicating when it first appears in the filtration, and a death time,

d_{x}

, indicating when it either becomes trivial or merges into an older feature. Collecting these persistence intervals,

[b_{x}, d_{x})

, across all dimensions yields the persistence diagram for a given homological degree d, which provides a concise summary of the multiscale topological features present in the data.

Although persistent homology is a highly general framework that can be applied to filtrations of any topological space, in this work, we focus on filtrations constructed from scalar functions defined on the nodes of a graph. A simple and widely adopted method for building such filtrations is the lower-star filtration, which constructs nested subcomplexes based on real-valued functions assigned to the vertices. This approach is especially advantageous when distance metrics are not available or do not make sense, or when meaningful scalar attributes—such as structural properties or centrality scores—can be derived from the graph and assigned to its nodes.

Let

G = (V, E; g)

be an undirected graph, where V is the set of nodes,

E \subseteq {{u, v} ∣ u, v \in V, u \neq v}

is the set of edges, and

g : V \to R

assigns real-valued attributes to the nodes. This graph can be naturally interpreted as a one-dimensional simplicial complex; the nodes are 0-simplices and edges are 1-simplices. To capture more complex topological structures beyond pairwise connections, we extend this to higher-dimensional simplices. For instance, a triangle formed by three mutually connected nodes corresponds to a 2-simplex, and a tetrahedron formed by four fully connected nodes corresponds to a 3-simplex, and so forth. The resulting complex, often called a clique complex or flag complex, encodes all complete subgraphs of the original graph as simplices. For notational simplicity, we continue to denote this extended simplicial complex by G.

To analyze the topology of G, we build a filtration by extending the function g from the vertices to all simplices using the maximum rule. Specifically, for any simplex

τ \subseteq G

(viewed as a simplicial complex), we define

g (τ) = max {g (v) : v \in τ} .

This extension ensures monotonicity; if

σ \subseteq τ

, then

g (σ) \leq g (τ)

. As a result, the sublevel sets of g form valid simplicial complexes, which allows us to define a filtration. The resulting Lower-Star filtration is defined as a sequence of nested subcomplexes

G_{t} = {τ \subseteq G ∣ g (τ) \leq t},

where the filtration parameter t increases over the range of values taken by g. In this construction, a simplex enters the filtration at the smallest threshold t such that all its vertices satisfy

g (v) \leq t

. Because the value on a simplex depends on the maximum among its vertices, this method provides a consistent, hierarchical way to explore the topological features of the graph.

In the Lower-Star filtration, the choice of function values assigned to the vertices is critical, as it directly determines how the filtration unfolds and what topological features are captured. These function values act as filtration parameters, deciding the exact stage at which each simplex enters the filtration, ultimately shaping the topological summary we obtain. In this work, we define the vertex function g based on three types of node attributes: closeness centrality, betweenness centrality, and a custom measure we call degree2 centrality. Closeness centrality measures how close a node is, on average, to all other nodes in the network, based on the shortest-path distances. When analyzing unweighted graphs, each edge is treated as having unit length, and distances between nodes are computed as the number of edges along the shortest path. This node-level measure reflects the overall accessibility or influence of a node within the graph. Betweenness centrality measures how often a node lies on the shortest paths between other nodes, highlighting nodes that serve as bridges or bottlenecks within the graph. The degree2 centrality metric is defined as the count of unique nodes connected to a given node through paths of length one or two. By incorporating nodes at a distance of two hops, this measure provides a more extensive characterization of a node’s local neighborhood structure than that offered by standard degree centrality.

Each of these filter functions induces a distinct ordering of simplices, which governs how connected components, cycles, and other topological features appear and persist across scales. Applying persistent homology to these filtrations yields a persistence diagram that records the birth and death of each topological feature as the threshold t varies. These diagrams provide compact, multi-scale summaries of the graph’s topological structure relative to the chosen function g. (See Example 2 for an illustration of how lower-star filtrations can be used to compute persistence diagrams.)

Example 2.

Consider a graph

G = (V, E)

with vertex set

V = {a, b, c, d, e, f, g}

and edge set

E = {(a, b), (b, c), (c, d), (d, a), (b, e), (c, e), (d, f), (b, g)}

. This graph can be viewed as a 1-dimensional simplicial complex with seven vertices (0-simplices) and eight edges (1-simplices). We extend this complex by adding a 2-simplex, the triangle

T = {b, c, e}

, as shown in Figure 3. To analyze the topological structure of this complex, we define a filter function

g : V \to N

based on the degree of each vertex, such that

g (v) = deg (v)

, where

deg (v)

denotes the number of edges incident to vertex v. The degree values for each vertex are:

g (f) = g (g) = 1, g (a) = g (e) = 2, g (c) = g (d) = 3, g (b) = 4 .

We construct the lower-star filtration of the graph step by step. At each filtration step t, we include all simplices

σ \subseteq V

such that

{max}_{v \in σ} g (v) \leq t

.

Step

t = 1

: We begin with an empty complex

G_{0} = \emptyset

. At a filtration value of 1, we add vertices f and g since

g (f) = g (g) = 1

. No edges are included at this stage. Thus, the complex is

G_{1} = G_{0} \cup {{f}, {g}}

.

Step

t = 2

: We include vertices a and e, as their degree values are

g (a) = g (e) = 2

. No edges are added because all neighboring vertices have higher degree values. The complex is now

G_{2} = G_{1} \cup {{a}, {e}}

.

Step

t = 3

: We add vertices c and d, with degree values of 3. At this step, several edges are also included:

(c, e)

,

(c, d)

,

(a, d)

, and

(d, f)

, as the maximum degree of their endpoints is at most 3. The complex becomes

G_{3} = G_{2} \cup {{c}, {d}, {c, e}, {c, d}, {a, d}, {d, f}}

.

Step

t = 4

: Finally, we add vertex b and all remaining simplices. All edges with a maximum vertex degree of 4—

(a, b)

,

(b, c)

,

(b, e)

, and

(b, g)

—are added. The 2-simplex, the triangle

{b, c, e}

, is also added since its maximum vertex degree is 4. The final complex is

G_{4} = G_{3} \cup {{b}, {a, b}, {b, c}, {b, e}, {b, g}, {b, c, e}}

. Figure 4 visually demonstrates how the scalar function g induces a lower-star filtration on the graph G, as described above.

To compute the PD, we observe how topological features evolve as the filtration parameter t increases. At

t = 1

, two connected components are born; one persists indefinitely, while the other dies at

t = 3

when it merges with another component through the formation of an edge (a 1-simplex). At

t = 2

, two more connected components are born; both of these die at

t = 3

as they merge with existing components via edge formation. At

t = 4

, four more edges and a triangle (2-simplex) are added, which results in the birth of a loop. Since this loop is never filled in by a triangle at any later stage in the filtration, it persists indefinitely and is assigned an infinite death time in the PD. The PD resulting from the lower-star filtration is visualized in Figure 5 and detailed numerically in Table 1. In practical applications, features with infinite death values—representing topological structures that never disappear—are handled differently depending on the problem at hand. They are often either excluded from the diagram or assigned a suitable finite constant to facilitate analysis and computation. However, in scenarios where the PD is used directly (e.g., with kernel methods or distance-based comparisons), it is often preferable to retain infinite death values, as they may carry meaningful topological information about essential or dominant features of the space.

Figure 5. Visual representation of the persistence diagram shown in Table 1, where each point corresponds to a topological feature characterized by its birth and death values.

Table 1. Persistence diagram for the lower-star filtration of the graph in Figure 3. Each row represents a topological feature with its dimension, birth time, and death time. Infinite death values indicate features that persist throughout the entire filtration.

Dimension	Birth	Death
0	1	∞
0	1	4
0	2	3
0	2	3
1	4	∞

2.2. Distance Metrics and Stability of Persistence Diagrams

A standard metric for comparing PDs is the p-Wasserstein distance (

W_{p}

), defined for

p \geq 1

. Given two finite PDs,

D_{1}

and

D_{2}

, the p-Wasserstein distance is defined as:

W_{p} (D_{1}, D_{2}) = {(inf_{γ} \sum_{u \in D_{1}} {∥ u - γ (u) ∥}_{\infty}^{p})}^{1 / p}

(7)

where

{∥ \cdot ∥}_{\infty}

is the

L_{\infty}

norm on

R^{2}

, and

γ

ranges over all bijections (matchings) between the diagrams. To ensure a perfect matching always exists, both

D_{1}

and

D_{2}

are augmented with countably infinite copies of the diagonal,

Δ = {(b, d) \in R^{2} ∣ b = d}

. This allows unmatched off-diagonal points to be paired with their orthogonal projections onto the diagonal, effectively absorbing topological noise. The bottleneck distance (

d_{B}

) is a special case of this metric. It measures the maximum distance between matched points under the best possible bijection:

d_{B} (D_{1}, D_{2}) = inf_{γ} sup_{u \in D_{1}} {∥ u - γ (u) ∥}_{\infty} .

(8)

The bottleneck distance is also the limit of the p-Wasserstein distance as

p \to \infty

. The fundamental stability of persistence diagrams is guaranteed by these metrics. The stability theorem, first established for the bottleneck distance by Cohen-Steiner, Edelsbrunner, and Harer [24], states that small perturbations in the scalar functions used to construct filtrations lead to only small changes in the resulting PDs. Formally, if f and g are two tame functions (A function

f : X \to R

is tame if it has a finite number of homological critical values and the homology groups

H_{k} (f^{- 1} ((- \infty, a]))

are finite-dimensional for all

k \in Z

and

a \in R

[24]), the bottleneck distance between their PDs is bounded by the sup-norm difference between the functions:

d_{B} (D (f), D (g)) \leq {∥ f - g ∥}_{\infty} .

(9)

This robustness holds for the P-Wasserstein distances as well, providing a principled way to quantify topological differences that is reliable even with noisy data.

2.3. Vector Summary of Persistence Diagrams

Persistence diagrams offer a powerful way to summarize topological features in data. However, because the space of PDs is not a Hilbert space and PDs are multisets of varying sizes, they cannot be directly used in many statistical or machine learning methods. The metrics used to compare PDs, such as the Wasserstein or bottleneck distances, are also computationally expensive. Therefore, various vectorization techniques have been developed to transform PDs into finite-dimensional representations that are more compatible with standard algorithms. These vectorization methods include persistence landscapes [25,26,27,28], persistence images [29], persistence silhouettes [28], persistence entropy [30], persistence statistics [31], and Betti functions [32,33,34]. Many of these methods come with their own stability guarantees, ensuring that the vector representation remains robust to noise. In this work, we adopt three of these methods, providing a brief overview of their construction. Their implementations are available in the TDAvec package [35].

Definition 1

(Persistence Statistics). Given a PD,

D = {(b_{i}, d_{i})}_{i = 1}^{N}

, where

b_{i}

and

d_{i}

denote the birth and death times of the ith topological feature, persistence statistics refer to a collection of descriptive measures derived from the birth values

{b_{i}}

, death values

{d_{i}}

, midpoint

\{\frac{b_{i} + d_{i}}{2}\}

, and lifespans (or persistences)

{d_{i} - b_{i}}

. These statistics include the following:

Measures of central tendency and dispersion, such as the mean, standard deviation, median, interquartile range, range, and selected quantiles (10th, 25th, 75th, and 90th percentiles) for each of the four value types (birth, death, midpoint, and lifespan).
The total number of points (or bars) in the diagram.
The entropy of the lifespan distribution is defined as

$H = - \sum_{i = 1}^{N} \frac{l_{i}}{L} {log}_{2} (\frac{l_{i}}{L}),$

(10)

where $l_{i} = d_{i} - b_{i}$ represents the lifespan of the ith feature and $L = \sum_{i = 1}^{N} l_{i}$ is the total sum of lifespans.

Definition 2

(Betti Function (also called a Betti curve)). Let

D = {(b_{i}, d_{i})}_{i = 1}^{N}

be a PD associated with a fixed homological dimension k, where each pair

(b_{i}, d_{i})

represents the birth and death of a k-dimensional topological feature. The Betti function

β_{k} (t)

is defined as:

β_{k} (t) = \sum_{i = 1}^{N} 1_{[b_{i}, d_{i})} (t),

(11)

where

1_{[b_{i}, d_{i})} (t)

is the indicator function:

1_{[b_{i}, d_{i})} (t) = \{\begin{matrix} 1 & if b_{i} \leq t < d_{i}, \\ 0 & otherwise . \end{matrix}

(12)

This function counts the number of k-dimensional features that are alive at a given filtration value t; specifically, the features that have emerged (been “born”) but have not yet disappeared (“died”) at time t. A common approach to vectorizing the Betti function is to evaluate it at each point in an increasing sequence of scale values

{t_{1}, t_{2}, \dots, t_{n}}

, resulting in a vector in

R^{n}

:

(β_{k} (t_{1}), β_{k} (t_{2}), \dots, β_{k} (t_{n})) \in R^{n} .

Because the Betti function is piecewise constant and integrable, it can also be summarized by averaging over consecutive intervals. This produces a vector in

R^{n - 1}

of the form:

(\frac{1}{Δ t_{1}} \int_{t_{1}}^{t_{2}} β_{k} (t) d t, \frac{1}{Δ t_{2}} \int_{t_{2}}^{t_{3}} β_{k} (t) d t, \dots, \frac{1}{Δ t_{n - 1}} \int_{t_{n - 1}}^{t_{n}} β_{k} (t) d t),

(13)

where

Δ t_{k} = t_{k + 1} - t_{k}

for

k = 1, \dots, n - 1

. While the two vectorization methods can yield similar results on a sufficiently dense grid, the interval-averaging method offers distinct advantages, particularly in terms of stability and robustness. The point-wise evaluation method (Equation (8)) is known to be unstable with respect to the 1-Wasserstein distance, as a small perturbation in the input data can lead to a large change in the output vector [36]. In contrast, the interval-averaging vectorization considers the behavior of the Betti function over the entire intervals determined by neighboring scale values. This approach is more robust to small perturbations and captures the contribution of features that appear and disappear between sample points. This vectorization method can also be used to compute informative low-dimensional vectors based on a sparse grid of scale values, which helps mitigate noise and reduces the dimensionality of the feature representation [6]. (See the experiment of Section 3.1 in [6] which compares the two vectorization methods of the Betti function when the extracted vector summaries are low-dimensional).

Definition 3

(Generalized Persistence Landscape (generalized persistence landscape is a kernel-based variant of the standard persistence landscape; see [35] for more details and implementations)). Let

D = {(b_{i}, d_{i})}_{i = 1}^{N}

be a PD in a fixed homological dimension k, where each pair

(b_{i}, d_{i}) \in R^{2}

represents the birth and death of a topological feature. The jth-order persistence landscape function is defined as follows:

λ_{j} (t) = j - max_{1 \leq i \leq N} Λ_{i} (t), j = 1, \dots, k,

where

j max

denotes the jth largest value among the set

{Λ_{1} (t), Λ_{2} (t), \dots, Λ_{N} (t)}

. Each

Λ_{i} (t)

is a piecewise-linear, triangular “tent” function defined as:

Λ_{i} (t) = \{\begin{matrix} t - b_{i} & if t \in [b_{i}, \frac{b_{i} + d_{i}}{2}], \\ d_{i} - t & if t \in (\frac{b_{i} + d_{i}}{2}, d_{i}], \\ 0 & otherwise . \end{matrix}

(14)

Each triangle peaks at

t = \frac{b_{i} + d_{i}}{2}

with height

\frac{d_{i} - b_{i}}{2}

, capturing the lifespan of the corresponding topological feature.

To obtain a finite-dimensional representation suitable for machine learning, each landscape function

λ_{k} (t)

is sampled at a predefined sequence of increasing filtration values

{t_{1}, t_{2}, \dots, t_{n}}

, resulting in a vector of the form:

(λ_{k} (t_{1}), λ_{k} (t_{2}), \dots, λ_{k} (t_{n})) \in R^{n} .

Thegeneralized persistence landscape extends the traditional definition by replacing the sharp triangular functions

Λ_{i} (t)

with smooth kernel-based bump functions. Let

K : R \to R

be a symmetric kernel supported on the interval

[- 1, 1]

, and define its scaled version with bandwidth

h > 0

as:

K_{h} (t) = \frac{1}{h} K (\frac{t}{h}) .

Then, for each point

(b_{i}, d_{i})

, the smoothed function

Λ_{i} (t)

is defined by:

Λ_{i} (t) = \{\begin{matrix} \frac{d_{i} - b_{i}}{2 K_{h} (0)} \cdot K_{h} (t - \frac{b_{i} + d_{i}}{2}) & if |\frac{t - \frac{b_{i} + d_{i}}{2}}{h}| \leq 1, \\ 0 & otherwise . \end{matrix}

(15)

As before, generalized landscape functions

λ_{k} (t)

are defined in the same way. Common choices for the kernel K include the triangle kernel, the Epanechnikov kernel, and the tricubic kernel, each offering different smoothness properties and bandwidth sensitivities.

Figure 6 provides a graphical illustration of these summary functions: (a) Persistence diagram —displays the birth and death times of topological features as points in the plane. (b) Persistence statistics—summarizes information from the persistence diagram using numerical descriptors such as mean, standard deviation, and maximum lifetime. (c) Betti function—shows how the number of topological features (e.g., connected components or loops) changes over the filtration process. (d) Generalized persistence landscape—encodes topological features into functional representations that are suitable for statistical comparison and integration into machine learning models. Unlike classical persistence landscapes, the generalized version may include smoothing or nonlinear transformations and is not necessarily piecewise-linear.

3. Proposed Nested Ensemble Learning Framework

In this section, we present our proposed nested ensemble learning framework, designed to integrate topological features extracted from graph data using a hierarchical stacking approach. Stacking is an ensemble technique in which the predictions of multiple base models are used as input features for one or more higher-level meta-learners, which aim to learn optimal combinations of these predictions. Our framework extends traditional stacking by introducing additional layers, resulting in a three-level architecture: Level-0 (Base Models), Level-1 (Filter-Specific Ensembles), and Level-2 (Cross-Filter Ensemble). Level-1 meta-learners combine the outputs of base models within each filter to form filter-specific ensembles. The final nested ensemble prediction is then produced by a Level-2 meta-learner, which learns to optimally weight and integrate predictions from the different filter-specific ensembles. The specific choice of meta-learners at both levels depends on the complexity and characteristics of the data.

Let

G = (V, E)

denote a graph with node set V and edge set E. For each node

v \in V

, we compute three distinct node attributes (filter functions): Closeness centrality (

f_{closeness} (v)

), Degree-2 centrality (

f_{degree 2} (v)

), and Betweenness centrality (

f_{betweenness} (v)

).

For each graph

G_{i}

and each filter function

f \in {f_{closeness}, f_{degree 2}, f_{betweenness}}

, we compute a persistence diagram

D_{i}^{(f)}

via a lower-star filtration. This yields:

D^{(f)} = {D_{1}^{(f)}, D_{2}^{(f)}, \dots, D_{N}^{(f)}},

(16)

capturing the topological features of all graphs under filter f.

Each

D_{i}^{(f)}

is vectorized into a fixed-dimensional feature representation using summary functions (e.g., persistence statistics, Betti curves, generalized persistence landscapes). This produces a feature matrix:

X^{(f)} \in R^{N \times d_{f}},

(17)

where N is the number of samples (graphs) and

d_{f}

is the feature dimension associated with filter f. This is performed separately for the training and test sets, yielding

X_{train}^{(f)}, X_{test}^{(f)} along side labels y_{train}, y_{test} .

We propose a three-level ensemble learning procedure, described as follows:

Base Model Training (Level-0):
For each filter $f \in {f_{closeness}, f_{degree 2}, f_{betweenness}}$ , we independently train M diverse base models ${h_{1}^{(f)}, h_{2}^{(f)}, \dots, h_{M}^{(f)}}$ on the training feature matrix $X_{train}^{(f)}$ . Each base model, $h_{m}^{(f)}$ , is a learning algorithm (e.g., a neural network, support vector machine, or gradient boosting model) that maps $R^{d_{f}} \to R$ . The predictions from each base model are then given by:

${\hat{y}}_{j, train}^{(f)} = h_{j}^{(f)} (X_{train}^{(f)}), j = 1, \dots, M .$

(18)
Filter-specific Ensemble (Level-1): For each filter function f, the predictions from the M base models are aggregated into a matrix:

$Z_{train}^{(f)} = [\begin{matrix} {\hat{y}}_{1, train}^{(f)} & {\hat{y}}_{2, train}^{(f)} & \dots & {\hat{y}}_{M, train}^{(f)} \end{matrix}] \in R^{N \times M} .$

(19)

A meta-learner $g^{(f)} : R^{M} \to R$ is then trained to combine these base-model predictions. This meta-learner takes the outputs of the Level-0 models as input features and learns the optimal way to assign weights to each base model based on their predictions, producing a single, refined prediction for each sample. The resulting predictions for the training data are:

${\tilde{y}}_{i}^{(f)} = g^{(f)} (Z_{i}^{(f)}), i = 1, \dots, N .$

(20)

The choice of meta-learner $g^{(f)}$ depends on the complexity of the task; simple models like logistic regression may suffice, while more flexible methods such as tree-based models or shallow neural networks can be used when needed.
These refined outputs form the filter-specific ensemble prediction:

${\tilde{y}}_{train}^{(f)} = {[{\tilde{y}}_{1}^{(f)}, {\tilde{y}}_{2}^{(f)}, \dots, {\tilde{y}}_{N}^{(f)}]}^{⊤} .$

(21)

This process is repeated for each filter function, resulting in three separate Filter-specific Ensembles:

$E^{(closeness)}, E^{(degree 2)}, E^{(betweenness)} .$
Cross-filter Ensemble (Level-2): At the second level, we combine the outputs from the three filter-specific ensembles into a new training set. For the training data, we form a matrix:

$W_{train} = [\begin{matrix} {\tilde{y}}_{1}^{(closeness)} & {\tilde{y}}_{1}^{(degree 2)} & {\tilde{y}}_{1}^{(betweenness)} \\ ⋮ & ⋮ & ⋮ \\ {\tilde{y}}_{N}^{(closeness)} & {\tilde{y}}_{N}^{(degree 2)} & {\tilde{y}}_{N}^{(betweenness)} \end{matrix}] \in R^{N \times 3}$

(22)

Similarly, for the test data we construct:

$W_{test} = [\begin{matrix} {\tilde{y}}_{N + 1}^{(closeness)} & {\tilde{y}}_{N + 1}^{(degree 2)} & {\tilde{y}}_{N + 1}^{(betweenness)} \\ ⋮ & ⋮ & ⋮ \\ {\tilde{y}}_{N + T}^{(closeness)} & {\tilde{y}}_{N + T}^{(degree 2)} & {\tilde{y}}_{N + T}^{(betweenness)} \end{matrix}] \in R^{T \times 3}$

(23)

Another meta-learner $G : R^{3} \to R$ is trained on the outputs of the filter-specific ensemble, using the training set $(W_{train}, y_{train})$ . As a distinct learning algorithm, $G$ takes as input a three-dimensional vector containing the predictions from the filter-specific ensemble ( $E^{(closeness)}$ , $E^{(degree 2)}$ , $E^{(betweenness)}$ ) and learns to assign adaptive weights to these inputs. The final nested ensemble prediction is then produced:

$\hat{y} final = G ([E closeness (X^{(closeness)}), E_{degree 2} (X^{(degree 2)}), E_{betweenness} (X^{(betweenness)})]) .$

(24)

The optimization objective is to minimize the generalization error:

$min_{G, {g^{(f)}}, {h_{j}^{(f)}}} E_{(X, y)} [L (y, {\hat{y}}_{final})],$

(25)

where $L$ is a suitable loss function (e.g., mean squared error for regression, cross-entropy for classification).

The optimization process trains the Level-2 meta-learner to assign dynamic, context-aware weights to each filter-specific ensemble prediction. In this manner, the framework effectively synthesizes complementary topological signals into a single, unified output while minimizing prediction error. This integration strategy is designed to preserve the local discriminative power of individual filter ensembles while simultaneously capturing the global relationships between different topological perspectives. As a result, the approach yields more robust predictions than any single-filter method. A schematic overview of the full nested ensemble architecture is provided in Figure 7.

4. Experimental Setup

In this section, we evaluate the performance of the proposed nested ensemble framework by comparing it against both the individual base models and the filter-specific ensembles (i.e., stacking ensembles). The evaluation is carried out on both simulated and real-world datasets, across classification and regression tasks. For the classification experiments, we further benchmark our approach against several state-of-the-art methods to highlight its competitive effectiveness. To vectorize the PDs, we use three summary functions described in Section 2.3: persistence statistics (PS), the Betti function (BF), and generalized persistence landscapes (PLs). For both the BF and PL representations, we compute values over a fixed scale sequence of 100 evenly spaced points ranging from 0 to 2. In the case of generalized persistence landscapes, we additionally apply a kernel smoothing procedure based on an Epanechnikov kernel with a bandwidth parameter set to

h = 0.2

.

In the simulation study, we generate synthetic graphs using three well-established models: the Erdős–Rényi model (ERM) [37], the stochastic block model (SBM) [38] and the Watts–Strogatz model [39]. Graphs generated from the Erdős–Rényi model and stochastic block model are employed to evaluate classification performance, while those generated by the Watts–Strogatz model are used to assess regression performance.

For the regression tasks, we compare the performance of the proposed nested ensemble model against both individual base models and stacking ensembles, using two distinct feature extraction approaches. The first approach leverages topological features derived from persistent homology, where lower-star filtrations are constructed by using node attributes as filter functions. The second approach is purely non-topological, relying solely on quantile-based summaries of the node attributes. To further validate our methodology, we evaluate our model on four widely used benchmark datasets for graph classification—IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and PROTEINS—all sourced from the TUDataset collection [18]. These datasets, which are constructed from social and biological network structures, are frequently used in fields such as biomedical and biological network analysis. Descriptive statistics and additional details for each dataset are provided in Table 8. We also compare the performance of our proposed approach against several established graph classification methods [40,41,42,43,44]. All analyses were conducted using the R statistical software 4.1.1 environment, and all machine learning models were trained using default hyperparameter settings unless otherwise specified.

4.1. Simulation Studies

To evaluate the performance and robustness of our proposed nested ensemble framework across diverse network structures, we conducted two sets of simulation experiments. The first experiment uses Erdős–Rényi graphs to model homogeneous random networks, while the second employs stochastic block models (SBMs) to generate networks with well-defined community structures. Together, these experiments highlight the versatility of our approach across a broad spectrum of graph topologies.

4.1.1. Experiment on Erdős–Rényi Graphs

Before evaluating the performance of the proposed model on Erdős–Rényi graphs, we first assess the computational cost and scalability of our method for extracting topological features from graph data. This evaluation is carried out under two controlled conditions. The first condition varies the number of graphs (N) while keeping the edge probability (p) for each class fixed. The second condition fixes the number of graphs (N) and varies the edge probabilities (p) across classes. For each generated graph, node-level attributes—including degree2, closeness, and betweenness—are first computed to serve as filter functions. Subsequently, lower-star filtrations are constructed based on these attributes, and PDs are computed from the filtrations. In all cases, the number of nodes in each graph is uniformly sampled from the interval

[80, 120]

. The results are presented in Table 2a,b, which provide the median run-time costs (measured in seconds and computed based on ten repeated simulations) for these two studies. Table 2a demonstrates that the median run-time for computing PDs scales efficiently and nearly linearly with the number of graphs N when graph density is fixed.

For example, processing 100 graphs took approximately 1.70 s, while 1000 graphs required about 15.88 s. In contrast, Table 2b reveals that run-time varies considerably with graph complexity. With the number of graphs fixed at 100, randomly sampling the edge probabilities (

p_{1}

,

p_{2}

) resulted in median run-times ranging from 2.39 to 221.22 s. This demonstrates the pipeline’s computational efficiency with respect to dataset size, as well as its sensitivity to graph structure and edge density. We now proceed to evaluate its performance in a classification context.

In the first experiment, we generate synthetic graphs using the Erdős–Rényi model. The number of nodes n in each graph is uniformly sampled from the interval

[80, 120]

. Each graph is assigned to one of two classes based on its edge probability; Class 1 uses

p_{1} = 0.35

, while Class 2 uses

p_{2} = 0.30

. The resulting dataset consists of 300 training graphs (150 per class) and 100 test graphs (50 per class). A summary of the average network properties, including density, average degree (AD), average shortest path distance (ASD), diameter, and transitivity, is provided in Table 3. These metrics underscore the structural differences between the two classes.

We compute PDs using the lower-star filtration induced by a function g, whose values are normalized to the interval

[0, 1]

. As filter functions for g, we use closeness, betweenness, and the custom-defined variant of degree centrality defined as degree 2 in Section 2.1.1. PDs are computed for each of these filter functions on both the training and test sets. These features, which capture the birth and death of topological features throughout the filtration process, are then vectorized.

For each filter function, we train three base machine learning models: Bagging, Random Forest, and Boosting. These base models are then combined using a generalized linear model (GLM) as a Level-1 meta-learner, which optimally weights their predictions to produce a single ensemble model for each filter function. Finally, a Random Forest (RF) is used as a Level-2 meta-learner to integrate the outputs of the three filter-specific ensembles, constructing the final nested ensemble model. Table 4 reports the classification accuracies (mean ± standard deviation) obtained from the simulated study. The results clearly demonstrate that the proposed nested ensemble model consistently outperforms both individual base models and stacked ensembles, regardless of the summary function (PS, BF, PL) or homology dimension (

H_{0}

or

H_{1}

) considered. For example, in the

H_{0}

dimension with Persistence Statistics (PS), the nested ensemble achieved an accuracy of 74.6, representing a

2.1 %

improvement over the best-performing base model (72.5) and a

1.8 %

gain over the top-stacked ensembles (72.8). The most substantial improvement in

H_{0}

was observed with the Persistence Landscape (PL), where the nested ensemble reached an accuracy of 75.8, outperforming the best individual base model (72.9) by

2.9 %

and the best stacked ensembles (73.5) by

2.3 %

. Performance gains were even more pronounced in the

H_{1}

dimension. Using PS, the nested ensemble achieved an accuracy of 96.5, exceeding both the best individual base model and the stacked ensembles (each at 94.1) by

2.4 %

. With the Betti functions (BFs), the nested ensemble attained 95.1, surpassing the best stacked ensembles (94.2) by

0.9 %

and the top-performing individual base model (94.6) by

0.5 %

. Finally, with PL, the nested ensemble achieved 94.7, yielding improvements of

1.0 %

and

0.9 %

over the best base model (93.7) and stacked ensembles (93.8), respectively.

4.1.2. Experiment on Stochastic Block Model Graphs

In the second experiment, we explored networks with inherent community structure, modeled using the stochastic block model (SBM). In this framework, a community is conceptualized as a subset of nodes that are densely connected internally, with relatively sparse connections between different communities.

To generate these networks, we use Erdős–Rényi models

G (n, p_{high})

and

G (n, p_{low})

to represent within- and between-community connections, respectively. We considered three different SBM configurations, with the number of communities set to

k_{0} = 2, 5,

and 10, each of roughly equal size. For each configuration, we generate 150 networks with edge probabilities

(p_{high}, p_{low}) = (0.8, 0.1)

, where the number of nodes n is uniformly sampled from

[80, 120]

. Table 5 summarizes the mean network properties for these SBM-generated graphs, demonstrating the increasing heterogeneity in network characteristics as the number of communities increases. These datasets are then split into training and testing sets using a 75–25% ratio.

We computed PDs for both the training and testing data for each filter function using a lower-star filtration, and then vectorized these diagrams with three topological summary functions (PS, BF, PL). The results are shown in Table 6.

Similarly to what we observed for the Erdős–Rényi model, the proposed nested ensemble learning framework consistently outperforms both individual base models and stacked ensembles, regardless of the summary function (PS, BF, PL) or homology dimension (

H_{0}

,

H_{1}

). Across networks generated with different numbers of communities (

k_{0} = 2, 5, 10

) the nested ensemble achieved substantial performance gains. For example, in the

H_{0}

dimension using Persistence Statistics (PS), the nested ensemble achieved an accuracy of

99.7 %

, representing improvements of up to

2.4 %

over the best individual base model (

97.3 %

) and

2.5 %

over the stacked ensembles (

97.2 %

). Similar gains were observed with Betti curves (BF) and Persistence Landscapes (PL).

Performance gains were even more pronounced in the

H_{1}

dimension. For PS, the nested ensemble achieved a perfect

100 %

accuracy, matching the best individual base model while exhibiting tighter consistency across folds. In the case of BF, the nested ensemble also reached

100 %

, exceeding the top-performing individual and stacked ensembles, which ranged between

99.5 %

and

99.9 %

. Finally, for PL in the

H_{1}

dimension, the nested ensemble attained

99.9 %

, yielding improvements of approximately

0.6 %

to

1.1 %

over the best base and stacked ensembles (around

98.8 %

).

4.1.3. Experiment on Simulated Data—Regression Setting

This experiment evaluates the effectiveness of the proposed nested ensemble model in a regression task involving synthetic graphs generated using the Watts–Strogatz small-world model. The primary objective is to determine whether topological features extracted via persistent homology offer superior predictive power compared to conventional non-topological summaries of node attributes. We generate a total of 200 graphs, with 150 used for training and 50 for testing. Each graph contains a number of nodes randomly sampled from the interval

[80, 120]

, and the number of nearest neighbors is fixed at 10. For each graph, the rewiring probability p, which serves as the regression target, is sampled uniformly from the range

[0.05, 0.5]

. The learning task is to predict this rewiring probability based solely on the structural characteristics of each graph.

To investigate the impact of different types of feature representations, we compare two distinct feature extraction strategies. In the first approach, topological features are derived from persistent homology by constructing lower-star filtrations based on node attributes as filter functions. PDs are computed for each filtration and vectorized to obtain fixed-length feature representations. The second approach serves as a non-topological baseline. For each graph, we compute the same node-level attributes—Degree2, Betweenness, and Closeness centrality—that were used as filter functions in the topological case. However, instead of constructing filtrations or computing persistence diagrams, we summarize each attribute using five simple quantile-based statistics: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. These features are extracted independently for each graph and provide a low-dimensional, interpretable summary of the node attribute distributions, without incorporating any topological or structural information.

For each feature extraction strategy, we train three base models: a neural network model (nnet), a bagging model (treebag), and a support vector machine with a radial basis function kernel (svmRadial), all implemented with the default hyperparameter settings provided by the caret package in R [45]. Our focus here is not on optimizing the performance of individual models, but rather on maintaining a consistent baseline to assess the relative effectiveness of the different feature extraction strategies. The predictions from these base models are combined using a generalized linear model (GLM) as a Level-1 meta-learner to form the filter-specific ensembles. At Level-2, a Random Forest (RF) is employed to integrate the outputs of the three filter-specific ensembles—

E_{Degree 2}

,

E_{Betweenness}

, and

E_{Closeness}

—to produce the final regression prediction. Model performance is evaluated using the Root Mean Squared Error (RMSE), averaged over 10 independent iterations. For the topological feature extraction strategy, we report the performance of the

H_{1}

features, although

H_{0}

also demonstrated competitive performance. The regression results for the simulated data under both feature extraction strategies are summarized in Table 7.

Across all experimental settings, the proposed nested ensemble model consistently outperforms both the individual base models and the standard stacked ensembles. When using topological features derived from persistent homology, the nested ensemble achieves the lowest overall RMSE values. Specifically, in the case of the

H_{1}

dimension using persistence statistics (PS), the nested ensemble attains an RMSE of

3.95 \pm 0.57

, compared to

4.01 \pm 0.52

for the best-performing stacked ensemble. A similar trend is observed for

H_{1}

using the Betti function (BF), where the nested ensemble achieves an RMSE of

4.22 \pm 0.20

, outperforming the best stacked ensemble at

4.28 \pm 0.23

and all individual base models.

In contrast, performance deteriorates across all models when using the graph-theoretic approach based on raw filter function summaries. While the nested ensemble still maintains a slight edge over the stacked ensemble (

5.91 \pm 0.61

vs.

6.00 \pm 0.58

), the improvement is less pronounced. Furthermore, individual base models such as the neural network and support vector machine perform significantly worse in this setting, with RMSE values reaching as high as

12.94 \pm 0.78

and

12.72 \pm 1.20

, respectively. These results suggest that topological features offer a more structured and informative representation for regression tasks than simple summary statistics based on node-level graph attributes.

4.2. Nested Ensemble on Real Data

This section evaluates the proposed nested ensemble model on four benchmark datasets from the TUDataset collection: IMDB-BINARY, IMDB-MULTI, REDDIT-BINARY, and PROTEINS. These datasets are widely used for benchmarking graph classification algorithms and are derived from real-world networks. Each dataset consists of graphs labeled into distinct classes and exhibits variation in graph size, number of nodes, and class distribution. A detailed summary of their characteristics is presented in Table 8. Given that some datasets exhibit class imbalance, we apply the ROSE (Random Over-Sampling Examples) technique [46,47] to balance the training data and mitigate potential bias during model learning.

Table 8. Summary of real-world graph datasets used in the experiments of Section 4.2.

Dataset	No. of Graphs	No. of Classes	Avg. No. of Nodes	Avg. No. of Edges
`IMDB-BINARY`	1000	2	19.77	96.53
`IMDB-MULTI`	1500	3	13.00	65.94
`REDDIT-BINARY`	2000	2	429.60	497.80
`PROTEINS`	1113	2	39.31	72.82

For each dataset, a 75–25% train–test split is performed. PDs are computed for homological dimensions 0 and 1 using lower-star filtrations induced by three filter functions. The filtration is guided by a function g, where g denotes filter function values normalized to the interval

[0, 1]

. The three filter functions used are (closeness, betweenness, degree 2). For each set of vectorized features corresponding to each filter function, three base models are trained: Bagging, Random Forest, and Boosting. These base models are combined using a generalized linear model (GLM) as a Level-1 meta-learner, which produces a filter-specific stacked ensemble. A Random Forest (RF) then serves as a Level-2 meta-learner, integrating the predictions from the three filter-specific stacked ensembles to produce the final output of the nested ensemble model. The classification results on the real-world datasets for

H_{0}

-based features are presented in Table 9. The results clearly show that the nested ensemble learning framework consistently achieves the highest accuracy, outperforming both individual base models and stacked ensembles. Here, we report results based exclusively on

H_{0}

-based features. Although features derived from

H_{1}

were also competitive, we present the results only for the homological dimension that achieved the superior predictive performance. Across all topological summary functions, the nested ensemble achieves gains of approximately

0.1 %

to

5 %

over the individual base models and

0.1 %

to

4.9 %

over the stacked ensembles. This pattern is consistent across all datasets, highlighting the robustness and effectiveness of the nested ensemble framework.

Table 10 compares the classification accuracies achieved by the proposed nested ensemble framework against several graph neural network (GNN) and graph kernel baselines on four benchmark datasets. The baseline results are taken directly from their original publications. As shown in the table, the nested ensemble consistently outperforms these baselines across most datasets or ranks among the top three methods when its performance is slightly lower. This highlights the effectiveness of the nested ensemble framework for learning graph-level properties across different domains. It is important to note, however, that the nested ensemble did not achieve equally strong results across all summary functions, particularly on datasets with imbalanced classes such as PROTEINS. Competitive performance relative to the baselines was observed primarily when using Persistence Statistics (PS), whereas results with Betti curves (BF) and persistence landscapes (PL) were generally less favorable.

5. Conclusions

We propose an efficient framework for extracting topological features directly from graph structures, without the need to embed them into metric spaces. Our method builds simplicial complexes from graphs and computes persistent homology using lower-star filtrations induced by three filter functions: closeness centrality, betweenness centrality, and a custom second-order degree centrality degree2. This approach captures the intrinsic topology of the graph while maintaining computational efficiency.

Building on these topological features, we propose a nested ensemble learning framework that leverages these features to build powerful model for both classification and regression tasks on graph-structured data. This framework extends standard stacking approaches by incorporating multiple filter functions, selected in a data-driven manner. This design allows the model to learn from diverse topological perspectives, thereby enhancing both accuracy and robustness. For each filter function, we train three base models whose predictions are combined through a meta-learner to create a filter-specific ensemble. A second-level meta-learner then learns to optimally combine these ensembles, effectively assigning weights to each filter function based on the data. We extensively evaluated this framework on both synthetic and real-world graph datasets across classification and regression tasks. In all experiments, our method consistently demonstrated performance gains. On the real-world datasets, it generally outperformed state-of-the-art baseline methods, and even in the worst cases still ranked among the top three models.

Despite its strong performance, a key limitation of the proposed framework is its sensitivity to class imbalance. When one class is significantly underrepresented, the nested ensemble may underperform relative to individual models, indicating a bias toward majority class patterns. As part of future work, we plan to enhance the framework’s ability to handle imbalanced data more effectively. Additionally, we aim to explore the theoretical underpinnings of the architecture, including a formal analysis of its bias–variance trade-off and generalization properties. Such insights would help clarify when and why the nested ensemble performs well and further support its application in real-world settings.

Author Contributions

Conceptualization, U.I.; Methodology, I.A. and U.I.; Software, I.A.; Validation, I.A.; Formal analysis, I.A. and U.I.; Investigation, I.A.; Resources, I.A.; Data curation, I.A.; Writing—original draft, I.A.; Writing—review & editing, U.I.; Visualization, I.A.; Supervision, U.I.; Project administration, U.I. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

All data supporting the findings of this study are included in this article. The benchmark datasets are publicly available at TUDatasets [18]. Further inquiries may be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Carlsson, G. Topology and data. Bull. Am. Math. Soc. 2009, 46, 255–308. [Google Scholar] [CrossRef]
Hajij, M.; Wang, B.; Scheidegger, C.; Rosen, P. Persistent homology guided exploration of time-varying graphs. arXiv 2017, arXiv:1707.06683. [Google Scholar]
You, K.; Kim, I.; Jin, I.H.; Jeon, M.; Shung, D. Comparing multiple latent space embeddings using topological analysis. arXiv 2022, arXiv:2208.12435. [Google Scholar] [CrossRef]
Hajij, M.; Zamzmi, G.; Cai, X. Persistent Homology and Graphs Representation Learning. arXiv 2021, arXiv:2102.12926. [Google Scholar] [CrossRef]
Islambekov, U.; Pathirana, H.; Khormali, O.; Akcora, C.; Smirnova, E. A fast topological approach for predicting anomalies in time-varying graphs. arXiv 2023, arXiv:2305.06523. [Google Scholar] [CrossRef]
Niepert, M.; Ahmed, M.; Kutzkov, K. Learning Convolutional Neural Networks for Graphs. arXiv 2016, arXiv:1605.05273. [Google Scholar] [CrossRef]
Zhao, Q.; Wang, Y. Learning metrics for persistence-based summaries and applications for graph classification. arXiv 2019, arXiv:1904.12189. [Google Scholar] [CrossRef]
Carrière, M.; Cuturi, M.; Oudot, S. PersLay: A neural network layer for persistence diagrams and new graph topological signatures. In Proceedings of the 23rd International Conference on Artificial Intelligence and Statistics (AISTATS), Online, 26–28 August 2020; Volume 108, pp. 2786–2796. [Google Scholar]
Hofer, C.; Kwitt, R.; Niethammer, M.; Uhl, A. Deep Learning with Topological Signatures. arXiv 2018, arXiv:1707.04041. [Google Scholar] [CrossRef]
Agerberg, J.; Guidolin, A.; Martinelli, A.; Roos Hoefgeest, P.; Eklund, D.; Scolamiero, M. Certifying Robustness via Topological Representations. arXiv 2025, arXiv:2501.10876. [Google Scholar] [CrossRef]
O’Bray, L.; Rieck, B.; Borgwardt, K. Filtration Curves for Graph Representation. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Virtual, 14–18 August 2021; pp. 1304–1314. [Google Scholar]
Nishikawa, N.; Ike, Y.; Yamanishi, K. Adaptive Topological Feature via Persistent Homology: Filtration Learning for Point Clouds. In Proceedings of the Advances in Neural Information Processing Systems 36 (NeurIPS 2023), New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
Cornell, F. Using topological autoencoders as a filtering function for global and local topology. arXiv 2021, arXiv:2012.03383. [Google Scholar] [CrossRef]
Horn, M.; Brouwer, E.D.; Moor, M.; Moreau, Y.; Rieck, B.; Borgwardt, K. Topological Graph Neural Networks. arXiv 2022, arXiv:2102.07835. [Google Scholar] [PubMed]
Bu, F.; Kang, S.; Shin, K. Interplay between Topology and Edge Weights in Real-World Graphs: Concepts, Patterns, and an Algorithm. arXiv 2023, arXiv:2305.09083. [Google Scholar] [CrossRef]
Loiseaux, D.; Scoccola, L.; Carrière, M.; Botnan, M.B.; Oudot, S. Stable Vectorization of Multiparameter Persistent Homology using Signed Barcodes as Measures. In Advances in Neural Information Processing Systems; Oh, A., Naumann, T., Globerson, A., Saenko, K., Hardt, M., Levine, S., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2023; Volume 36, pp. 68316–68342. Available online: https://proceedings.neurips.cc/paper_files/paper/2023/file/d75c474bc01735929a1fab5d0de3b189-Paper-Conference.pdf (accessed on 11 August 2025).
Kersting, K.; Kriege, N.M.; Morris, C.; Mutzel, P.; Neumann, M. Benchmark Data Sets for Graph Kernels; Technische Universitat Dortmund: Dortmund, Germany, 2016. [Google Scholar]
Edelsbrunner, H.; Harer, J.L. Computational Topology: An Introduction; American Mathematical Society: Providence, RI, USA, 2010. [Google Scholar]
Ghrist, R. Barcodes: The persistent topology of data. Bull. Am. Math. Soc. 2008, 45, 61–75. [Google Scholar] [CrossRef]
Otter, N.; Porter, M.A.; Tillmann, U.; Grindrod, P.; Harrington, H.A. A roadmap for the computation of persistent homology. EPJ Data Sci. 2017, 6, 17. [Google Scholar] [CrossRef]
Zhang, M.; Kalies, W.D.; Kelso, J.S.; Tognoli, E. Topological portraits of multiscale coordination dynamics. J. Neurosci. Methods 2020, 339, 108672. [Google Scholar] [CrossRef]
Reduced Homology. Available online: https://en.wikipedia.org/wiki/Reduced_homology (accessed on 11 August 2025).
Cohen-Steiner, D.; Edelsbrunner, H.; Harer, J. Stability of persistence diagrams. Discret. Comput. Geom. 2007, 37, 103–120. [Google Scholar] [CrossRef]
Bubenik, P. The Persistence Landscape and Some of Its Properties. In Topological Data Analysis; Springer International Publishing: Berlin/Heidelberg, Germany, 2020; pp. 97–117. [Google Scholar]
Berry, E.; Chen, Y.C.; Cisewski-Kehe, J.; Fasy, B.T. Functional summaries of persistence diagrams. J. Am. Stat. Assoc. 2020, 4, 211–262. [Google Scholar] [CrossRef]
Bubenik, P. Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 2015, 16, 77–102. [Google Scholar]
Chazal, F.; Fasy, B.T.; Lecci, F.; Rinaldo, A.; Wasserman, L. Stochastic convergence of persistence landscapes and silhouettes. In Proceedings of the Thirtieth Annual Symposium on Computational Geometry, Kyoto, Japan, 8–11 June 2014; pp. 474–483. [Google Scholar]
Adams, H.; Chepushtanova, S.; Emerson, T.; Hanson, E.; Kirby, M.; Motta, F.; Neville, R.; Peterson, C.; Shipman, P.; Ziegelmeier, L. Persistence Images: A Stable Vector Representation of Persistent Homology. arXiv 2016, arXiv:1507.06217. [Google Scholar] [CrossRef]
Atienza, N.; Gonzalez-Díaz, R.; Soriano-Trigueros, M. On the stability of persistent entropy and new summary functions for topological data analysis. Pattern Recognit. 2020, 107, 107509. [Google Scholar] [CrossRef]
Ali, D.; Asaad, A.; Jimenez, M.J.; Nanda, V.; Paluzo-Hidalgo, E.; Soriano-Trigueros, M. A Survey of Vectorization Methods in Topological Data Analysis. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 45, 14069–14080. [Google Scholar] [CrossRef]
Chazal, F.; Michel, B. An introduction to topological data analysis: Fundamental and practical aspects for data scientists. Front. Artif. Intell. 2021, 4, 667963. [Google Scholar] [CrossRef]
Chung, Y.M.; Lawson, A. Persistence curves: A canonical framework for summarizing persistence diagrams. Adv. Comput. Math. 2022, 48, 6. [Google Scholar] [CrossRef]
Islambekov, U.D.; Pathirana, H. Vector Summaries of Persistence Diagrams for Permutation-based Hypothesis Testing. Found. Data Sci. 2024, 6, 41–61. [Google Scholar] [CrossRef]
Islambekov, U.; Luchinsky, A. TDAvec: Vector Summaries of Persistence Diagrams. 2025. Available online: https://github.com/uislambekov/TDAvec (accessed on 11 August 2025).
Johnson, M.; Jung, J.H. Instability of the Betti Sequence for Persistent Homology and a Stabilized Version of the Betti Sequence. arXiv 2021, arXiv:2109.09218. [Google Scholar] [CrossRef]
Erdős, P.; Rényi, A. On random graphs. Publ. Math. 1959, 6, 290–297. [Google Scholar] [CrossRef]
Holland, P.W.; Laskey, K.B.; Leinhardt, S. Stochastic blockmodels: First steps. Soc. Netw. 1983, 5, 109–137. [Google Scholar] [CrossRef]
Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef] [PubMed]
Damke, C.; Melnikov, V.; Hüllermeier, E. A Novel Higher-order Weisfeiler-Lehman Graph Convolution. arXiv 2020, arXiv:2007.00346. [Google Scholar]
Errica, F.; Podda, M.; Bacciu, D.; Micheli, A. A Fair Comparison of Graph Neural Networks for Graph Classification. arXiv 2022, arXiv:1912.09893. [Google Scholar] [CrossRef]
Ivanov, S.; Burnaev, E. Anonymous Walk Embeddings. In Proceedings of the 35th International Conference on Machine Learning (ICML), Stockholm, Sweden, 10–15 July 2018; pp. 2186–2195. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? arXiv 2019, arXiv:1810.00826. [Google Scholar] [CrossRef]
Zhang, M.; Cui, Z.; Neumann, M.; Chen, Y. An End-to-End Deep Learning Architecture for Graph Classification. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Kuhn, M. Building Predictive Models in R Using the caret Package. J. Stat. Softw. 2008, 28, 1–26. [Google Scholar] [CrossRef]
Lunardon, N.; Menardi, G.; Torelli, N. ROSE: A Package for Binary Imbalanced Learning. R J. 2014, 6, 82–92. [Google Scholar] [CrossRef]
Menardi, G.; Torelli, N. Training and assessing classification rules with imbalanced data. Data Min. Knowl. Discov. 2014, 28, 92–122. [Google Scholar] [CrossRef]
Kerber, M.; Russold, F. Graphcode: Learning from multiparameter persistent homology using graph neural networks. arXiv 2024, arXiv:2405.14302. [Google Scholar] [CrossRef]

Figure 1. Geometric representation of simplices of increasing dimension—vertices (0-simplices), edges (1-simplices), triangles (2-simplices), and a tetrahedron (3-simplex)—shown on the left. On the right, these building blocks are combined to form a simplicial complex, illustrating how simplices connect through shared faces. Image reproduced from Zhang et al. [22].

Figure 2. Simplicial complex X.

Figure 3. A simple house graph G consisting of seven nodes and eight edges.

Figure 4. Illustration of the lower-star filtration at filtration steps

t = 1, 2, 3, 4

. Simplices are added according to the degree function g defined on the nodes of graph G. Initially, only isolated vertices are added. As the filtration parameter t increases, edges and higher-order simplices (triangles) are included whenever all participating vertices have degrees less than or equal to t.

Figure 4. Illustration of the lower-star filtration at filtration steps

t = 1, 2, 3, 4

. Simplices are added according to the degree function g defined on the nodes of graph G. Initially, only isolated vertices are added. As the filtration parameter t increases, edges and higher-order simplices (triangles) are included whenever all participating vertices have degrees less than or equal to t.

Figure 6. (a) illustrates a sample PD, while (b–d) display its corresponding vectorized summaries: persistence statistics, Betti function, and generalized persistence landscape, respectively. The generalized persistence landscape in (d) was constructed using the Epanechnikov kernel.

Figure 7. This figure illustrates the nested ensemble architecture. The process begins with graphs as input features, from which persistence diagrams are computed using lower-star filtrations induced by filter functions. The filter functions are node attributes: closeness, betweenness, and a custom variant of degree centrality called degree2. For each filter function, M base models are trained independently. Their outputs are then combined by Level-1 meta-learners into filter-specific ensembles. Finally, a Level-2 meta-learner integrates these ensembles to produce the final nested ensemble prediction.

Table 2. Comparison of median run-time costs (in seconds) for computing PDs from Erdős–Rényi graphs under two experimental setups: varying the number of graphs with fixed edge probabilities (left), and fixing the number of graphs while randomly sampling edge probabilities (right). Each run includes computing node attributes (closeness, betweenness, and degree2), constructing lower-star filtrations, and extracting the PDs.

(a) Varying N, Fixed Probabilities				(b) Fixed $N = 100$ , Random Probabilities
$N$	$p_{1}$	$p_{2}$	Runtime (s)	Trial	$p_{1}$	$p_{2}$	Runtime (s)
100	0.05	0.075	1.70	1	0.2326	0.1116	22.30
200	0.05	0.075	3.02	2	0.4987	0.2280	202.84
300	0.05	0.075	5.26	3	0.0601	0.0206	2.39
400	0.05	0.075	5.97	4	0.3972	0.4311	243.71
500	0.05	0.075	7.71	5	0.0876	0.3472	86.04
600	0.05	0.075	9.58	6	0.3606	0.2420	109.88
700	0.05	0.075	10.33	7	0.2169	0.3658	103.83
800	0.05	0.075	12.58	8	0.3118	0.0688	54.86
900	0.05	0.075	14.29	9	0.1617	0.3902	97.27
1000	0.05	0.075	15.88	10	0.4599	0.3481	221.22

Table 3. Mean network properties for ERM graphs with different edge probabilities.

Class	Density	AD	ASD	Diameter	Transitivity
class1	0.35	34.72	1.65	2.02	0.35
class2	0.30	29.79	1.70	2.35	0.30

Table 4. Classification accuracy (mean ± standard deviation from 10-fold cross-validation) across homology dimensions (

H_{0}

,

H_{1}

) using three summary functions (PS, BF, PL). Features are extracted from three filter functions: degree2, betweenness, and closeness. For each filter function, Bagging, Random Forest, and Boosting models are trained and combined into a stacked ensemble. These stacked ensembles are then integrated into a final nested ensemble model.

Table 4. Classification accuracy (mean ± standard deviation from 10-fold cross-validation) across homology dimensions (

H_{0}

,

H_{1}

) using three summary functions (PS, BF, PL). Features are extracted from three filter functions: degree2, betweenness, and closeness. For each filter function, Bagging, Random Forest, and Boosting models are trained and combined into a stacked ensemble. These stacked ensembles are then integrated into a final nested ensemble model.

Centrality	Bagging	Random Forest	Boosting	Stacked Ensemble
Closeness $H_{0}$ , PS	59.3 ± 4.8	60.7 ± 4.1	61.6 ± 6.7	61.6 ± 5.9
Betweenness $H_{0}$ , PS	64.6 ± 3.0	66.8 ± 2.7	67.2 ± 3.5	67.3 ± 4.3
Degree 2 $H_{0}$ , PS	72.3 ± 2.8	72.5 ± 3.0	72.0 ± 3.3	72.8 ± 4.1
Nested Ensemble $H_{0}$ , PS	–	–	–	74.6 ± 5.9
Closeness $H_{0}$ , BF	51.5 ± 3.7	53.7 ± 2.9	54.4 ± 5.0	53.0 ± 3.5
Betweenness $H_{0}$ , BF	66.6 ± 4.8	66.9 ± 6.2	68.3 ± 6.0	68.8 ± 5.5
Degree 2 $H_{0}$ , BF	70.1 ± 6.2	70.5 ± 5.7	69.9 ± 6.5	69.5 ± 6.2
Nested Ensemble $H_{0}$ , BF	–	–	–	72.1 ± 5.7
Closeness $H_{0}$ , PL	55.9 ± 5.3	58.3 ± 5.3	58.3 ± 2.8	56.6 ± 4.3
Betweenness $H_{0}$ , PL	65.2 ± 4.2	65.4 ± 5.7	65.5 ± 5.2	66.5 ± 5.3
Degree 2 $H_{0}$ , PL	70.9 ± 6.4	73.5 ± 6.2	72.9 ± 6.6	73.5 ± 7.0
Nested Ensemble $H_{0}$ , PL	–	–	–	75.8 ± 8.6
Closeness $H_{1}$ , PS	90.8 ± 3.6	91.0 ± 3.9	91.8 ± 2.6	92.0 ± 2.4
Betweenness $H_{1}$ , PS	93.8 ± 3.2	93.8 ± 3.3	94.1 ± 3.2	94.1 ± 2.6
Degree 2 $H_{1}$ , PS	93.3 ± 2.4	93.3 ± 2.6	94.0 ± 2.3	93.8 ± 2.4
Nested Ensemble $H_{1}$ , PS	–	–	–	96.5 ± 3.2
Closeness $H_{1}$ , BF	82.7 ± 3.2	82.8 ± 3.8	69.1 ± 3.7	83.3 ± 3.1
Betweenness $H_{1}$ , BF	92.9 ± 2.6	94.0 ± 2.5	93.5 ± 2.3	94.2 ± 2.3
Degree 2 $H_{1}$ , BF	87.2 ± 4.2	88.3 ± 3.8	88.2 ± 3.9	89.1 ± 4.2
Nested Ensemble $H_{1}$ , BF	–	–	–	95.1 ± 2.6
Closeness $H_{1}$ , PL	72.3 ± 4.9	72.5 ± 3.7	73.9 ± 2.9	75.1 ± 3.3
Betweenness $H_{1}$ , PL	92.8 ± 2.1	92.5 ± 2.5	93.7 ± 2.9	93.0 ± 2.8
Degree 2 $H_{1}$ , PL	80.4 ± 0.4	81.1 ± 4.2	81.4 ± 3.4	81.1 ± 3.8
Nested Ensemble $H_{1}$ , PL	–	–	–	94.7 ± 3.0

Table 5. Mean network properties for SBM graphs with

k_{0} = 2, 5, 10

.

Table 5. Mean network properties for SBM graphs with

k_{0} = 2, 5, 10

.

$k_{0}$	Density	AD	ASD	Diameter	Transitivity
2	0.45	44.17	1.55	2.41	0.66
5	0.23	22.92	1.79	3.00	0.41
10	0.17	16.50	1.92	3.00	0.26

Table 6. Classification accuracy (mean ± standard deviation from 10-fold cross-validation) is reported across homology dimensions (

H_{0}

,

H_{1}

) using three topological summary functions (PS, BF, PL). Features are extracted from three filter functions. For each filter, models are trained and combined into a stacked ensemble, which is then further integrated into a final nested ensemble using a meta-model.

Table 6. Classification accuracy (mean ± standard deviation from 10-fold cross-validation) is reported across homology dimensions (

H_{0}

,

H_{1}

) using three topological summary functions (PS, BF, PL). Features are extracted from three filter functions. For each filter, models are trained and combined into a stacked ensemble, which is then further integrated into a final nested ensemble using a meta-model.

Centrality	Bagging	Random Forest	Boosting	Stacked Ensemble
Closeness $H_{0}$ , PS	89.2 ± 3.0	91.5 ± 2.4	90.8 ± 2.5	91.7 ± 1.6
Betweenness $H_{0}$ , PS	96.1 ± 1.6	96.3 ± 1.4	95.9 ± 2.0	95.9 ± 1.8
Degree 2 $H_{0}$ , PS	96.9 ± 1.0	97.1 ± 1.6	97.3 ± 1.3	97.2 ± 1.4
Nested Ensemble $H_{0}$ , PS	–	–	–	99.7 ± 0.4
Closeness $H_{0}$ , BF	84.3 ± 2.4	84.1 ± 2.7	84.9 ± 2.8	85.6 ± 2.8
Betweenness $H_{0}$ , BF	95.0 ± 1.9	95.9 ± 1.4	95.5 ± 2.4	95.4 ± 1.6
Degree 2 $H_{0}$ , BF	96.3 ± 1.7	96.3 ± 1.5	96.0 ± 1.5	96.3 ± 1.5
Nested Ensemble $H_{0}$ , BF	–	–	–	98.9 ± 0.9
Closeness $H_{0}$ , PL	90.4 ± 2.2	91.6 ± 1.9	91.1 ± 1.7	91.4 ± 1.6
Betweenness $H_{0}$ , PL	93.1 ± 2.5	93.6 ± 2.9	94.1 ± 2.1	94.1 ± 2.9
Degree 2 $H_{0}$ , PL	95.9 ± 1.5	95.8 ± 1.3	95.9 ± 1.4	96.0 ±1.7
Nested Ensemble $H_{0}$ , PL	–	–	–	99.1 ± 1.0
Closeness $H_{1}$ , PS	99.5 ± 0.6	99.8 ± 0.4	99.7 ± 0.4	99.7 ± 0.4
Betweenness $H_{1}$ , PS	99.4 ± 1.0	100 ± 0.0	99.8 ± 0.6	99.7 ± 0.6
Degree 2 $H_{1}$ , PS	99.7 ± 0.4	100 ± 0.0	100 ± 0.0	99.9 ± 0.3
Nested Ensemble $H_{1}$ , PS	–	–	–	100 ± 0.0
Closeness $H_{1}$ , BF	98.9 ± 1.1	99.5 ± 0.6	99.3 ± 0.8	99.3 ± 0.8
Betweenness $H_{1}$ , BF	99.7 ± 0.6	99.8 ± 0.4	99.8 ± 0.4	99.9± 0.3
Degree 2 $H_{1}$ , BF	99.5 ± 0.8	99.2 ± 1.0	99.5 ± 0.8	99.7 ± 0.4
Nested Ensemble $H_{1}$ , BF	–	–	–	100 ± 0
Closeness $H_{1}$ , PL	96.2 ± 1.9	97.9 ± 1.3	97.7 ± 0.6	97.9 ± 1.3
Betweenness $H_{1}$ , PL	97.8 ± 1.3	99.3 ± 0.8	98.4 ± 0.9	98.8 ± 1.1
Degree 2 $H_{1}$ , PL	97.7 ± 1.5	98.4 ± 1.5	98.0 ± 1.5	98.5 ± 1.1
Nested Ensemble $H_{1}$ , PL	–	–	–	99.9 ± 0.3

Table 7. Root Mean Squared Error (RMSE) values for the regression task on simulated Watts–Strogatz graphs, comparing individual base models (nnet, treebag, svmRadial), staked ensembles, and the proposed nested ensemble model. Results are reported for two feature extraction strategies: (i)

H_{1}

-based topological features derived from persistent homology, and (ii) graph-theoretic features based on the filter functions (closeness, betweenness, and degree2). Lower RMSE values indicate better model performance.

Table 7. Root Mean Squared Error (RMSE) values for the regression task on simulated Watts–Strogatz graphs, comparing individual base models (nnet, treebag, svmRadial), staked ensembles, and the proposed nested ensemble model. Results are reported for two feature extraction strategies: (i)

H_{1}

-based topological features derived from persistent homology, and (ii) graph-theoretic features based on the filter functions (closeness, betweenness, and degree2). Lower RMSE values indicate better model performance.

Centrality	nnet	Bagging	SvmRadial	Stacked Ensemble
Topological Features
Closeness $H_{1}$ , PS	3.99 ± 0.47	5.08 ± 0.65	4.87 ± 0.72	4.01 ± 0.52
Betweenness $H_{1}$ , PS	4.81 ± 3.13	5.57 ± 0.89	5.15 ± 0.95	4.68 ± 2.69
Degree2 $H_{1}$ , PS	8.98 ± 8.1	5.54 ± 1.02	4.86 ± 0.71	6.02 ± 2.43
Nested Ensemble $H_{1}$ , PS	–	–	–	3.95 ± 0.57
Closeness $H_{1}$ , BF	6.85 ± 0.56	8.50 ± 0.75	6.86 ± 0.47	5.56 ± 0.48
Betweenness $H_{1}$ , BF	5.62 ± 0.63	4.56 ± 0.29	6.57 ± 0.25	4.28 ± 0.23
Degree2 $H_{1}$ , BF	5.21 ± 0.49	5.62 ± 0.39	6.68 ± 0.40	4.76 ± 0.37
Nested Ensemble $H_{1}$ , BF	–	–	–	4.22 ± 0.20
Non-topological Features
Closeness	12.4 ± 1.07	12.38 ± 1.66	12.72 ± 1.20	12.31 ± 1.11
Betweenness	12.94 ± 0.78	12.14 ± 9.50	11.69 ± 1.05	11.6 ± 0.96
Degree2	6.36 ± 0.83	6.06 ± 0.55	6.32 ± 0.72	6.00 ± 0.58
Nested Ensemble	–	–	–	5.91 ± 0.61

Table 9. Classification accuracy results for PS, BF, and PL constructed from

H_{0}

features.

Table 9. Classification accuracy results for PS, BF, and PL constructed from

H_{0}

features.

Centrality	Bagging	Random Forest	Boosting	Stacked Ensemble
REDDIT-BINARY
Closeness $H_{0}$ , PS	87.2 ± 1.5	87.4 ± 1.1	86.6 ± 1.5	87.3 ± 1.5
Betweenness $H_{0}$ , PS	87.8 ± 1.9	88.1 ± 2.0	87.4 ± 2.1	88.1 ± 1.7
Degree2 $H_{0}$ , PS	89.5 ± 1.0	90.2 ± 1.3	89.3 ± 1.5	90.2 ± 1.3
Nested Ensemble $H_{0}$ , PS	–	–	–	90.3 ± 1.6
Closeness $H_{0}$ , BF	58.7 ± 2.0	64.8 ± 1.2	63.7 ± 2.3	64.8 ± 2.2
Betweenness $H_{0}$ , BF	74.0 ± 1.5	78.0 ± 1.1	79.0 ± 1.2	78.8 ± 1.5
Degree2 $H_{0}$ , BF	84.8 ± 1.5	86.1 ± 1.4	84.6 ± 1.7	86.1 ± 1.4
Nested Ensemble $H_{0}$ , BF	–	–	–	88.1 ± 1.1
Closeness $H_{0}$ , PL	75.7 ± 1.8	75.5 ± 1.5	76.8 ± 1.4	76.9 ± 1.5
Betweenness $H_{0}$ , PL	71.7 ± 1.1	73.9 ± 3.2	78.6 ± 1.8	78.8 ± 1.2
Degree2 $H_{0}$ , PL	77.7 ± 1.3	79.4 ± 2.2	80.3 ± 1.4	80.3 ± 1.5
Nested Ensemble $H_{0}$ , PL	–	–	–	83.7 ± 2.1
IMDB-MULTI
Closeness $H_{0}$ , PS	48.7 ± 2.1	48.5 ± 2.3	48.0 ± 2.1	48.7 ± 2.0
Betweenness $H_{0}$ , PS	40.4 ± 1.4	40.6 ± 0.8	40.2 ± 1.7	40.9 ± 1.7
Degree2 $H_{0}$ , PS	48.7 ± 3.1	48.8 ± 2.4	49.4 ± 2.9	49.4 ± 2.2
Nested Ensemble $H_{0}$ , PS	–	–	–	54.1 ± 1.1
Closeness $H_{0}$ , BF	47.8 ± 2.2	46.7 ± 3.2	46.4 ± 1.8	48.2 ± 2.2
Betweenness $H_{0}$ , BF	39.1 ± 1.5	39.2 ± 1.5	39.9 ± 2.0	39.3 ± 1.9
Degree2 $H_{0}$ , BF	46.9 ± 1.9	46.5 ± 2.0	46.3 ± 2.6	46.9 ± 1.7
Nested Ensemble $H_{0}$ , BF	–	–	–	52.1 ± 3.0
Closeness $H_{0}$ , PL	47.8 ± 2.4	47.1 ± 2.9	46.7 ± 2.7	47.5 ± 2.6
Betweenness $H_{0}$ , PL	39.2 ± 1.5	39.3 ± 1.3	39.5 ± 1.6	40.3 ± 1.8
Degree2 $H_{0}$ , PL	46.9 ± 2.6	47.2 ± 3.2	47.1 ± 2.5	47.5 ± 2.5
Nested Ensemble $H_{0}$ , PL	–	–	–	52.2 ± 2.8
PROTEINS
Closeness $H_{0}$ , PS	63.9 ± 5.2	69.4 ± 1.9	63.6 ± 5.1	71.0 ± 1.9
Betweenness $H_{0}$ , PS	55.0 ± 8.6	68.2 ± 4.3	64.1 ± 6.5	70.0 ± 2.0
Degree2 $H_{0}$ , PS	64.7 ± 3.1	66.8 ± 2.2	64.6 ± 3.0	68.8 ± 2.1
Nested Ensemble $H_{0}$ , PS	–	–	–	78.0 ± 2.3
Closeness $H_{0}$ , BF	46.5 ± 2.8	51.7 ± 2.8	53.5 ± 3.4	54.4 ± 3.2
Betweenness $H_{0}$ , BF	49.3 ± 2.4	52.5 ± 1.2	53.7 ± 1.7	55.1 ± 1.9
Degree2 $H_{0}$ , BF	48.9 ± 2.6	51.1 ± 1.6	51.9 ± 2.0	53.8 ± 2.4
Nested Ensemble $H_{0}$ , BF	–	–	–	68.0 ± 3.3
Closeness $H_{0}$ , PL	59.7 ± 0.2	62.0 ± 1.4	59.8 ± 0.5	69.2 ± 2.4
Betweenness $H_{0}$ , PL	40.5 ± 0.2	41.3 ± 1.7	40.6 ± 0.3	51.8 ± 6.2
Degree2 $H_{0}$ , PL	42.4 ± 5.9	61.7 ± 7.0	40.5 ± 4.0	61.7 ± 6.1
Nested Ensemble $H_{0}$ , PL	–	–	–	71.8 ± 2.9
IMDB-BINARY
Closeness $H_{0}$ , PS	70.1 ± 2.7	70.1 ± 2.6	67.6 ± 2.4	70.9 ± 2.9
Betweenness $H_{0}$ , PS	64.3 ± 2.3	64.5 ± 1.7	64.2 ± 2.0	65.0 ± 2.9
Degree2 $H_{0}$ , PS	71.3 ± 2.3	70.7 ± 2.0	70.8 ± 1.6	71.0 ± 1.6
Nested Ensemble $H_{0}$ , PS	–	–	–	72.1 ± 2.6
Closeness $H_{0}$ , BF	68.4 ± 1.9	67.3 ± 0.3	63.2 ± 2.1	68.8 ± 1.9
Betweenness $H_{0}$ , BF	62.8 ± 2.7	63.0 ± 2.5	61.8 ± 3.4	63.6 ± 3.0
Degree2 $H_{0}$ , BF	69.3 ± 2.8	70.0 ± 2.7	69.4 ± 2.6	70.8 ± 2.6
Nested Ensemble $H_{0}$ , BF	–	–	–	79.0 ± 2.3
Closeness $H_{0}$ , PL	68.3 ± 2.4	69.6 ± 2.7	66.7 ± 2.2	69.7 ± 2.6
Betweenness $H_{0}$ , PL	63.2 ± 2.8	61.2 ± 1.9	62.0 ± 2.1	62.4 ± 2.7
Degree2 $H_{0}$ , PL	68.9 ± 2.3	68.5 ± 3.6	69.4 ± 3.1	69.9 ± 2.9
Nested Ensemble $H_{0}$ , PL	–	–	–	78.2 ± 1.7

Table 10. Graph classification accuracy (%) across benchmark datasets. The top section reports results from baseline methods, taken directly from their original publications. The bottom section presents results from the proposed nested ensemble learning framework using three summary functions (PS, BF, PL), evaluated on

H_{0}

-based features.

Table 10. Graph classification accuracy (%) across benchmark datasets. The top section reports results from baseline methods, taken directly from their original publications. The bottom section presents results from the proposed nested ensemble learning framework using three summary functions (PS, BF, PL), evaluated on

H_{0}

-based features.

Model	IMDB-BINARY	REDDIT-BINARY	IMDB-MULTI	PROTEINS
Previous Methods
2-WL-GNN [40]	72.2 ± 3.1	89.4 ± 2.6	—	76.2 ± 3.3
GIN-0 [43]	75.1 ± 5.1	92.4 ± 2.5	52.3 ± 2.8	76.2 ± 2.8
GraphSAGE [41]	68.8 ± 4.5	84.3 ± 1.9	47.6 ± 3.5	75.9 ± 3.2
DGCNN [44]	69.2 ± 3.0	87.8 ± 2.5	45.6 ± 3.4	72.9 ± 3.5
AWE [42]	74.5 ± 5.8	87.9 ± 2.5	51.1 ± 3.6	—
MP-HSM-C [48]	74.8 ± 2.5	—	—	74.6 ± 2.1
Ours: Nested Ensemble Methods
Nested Ensemble (PS) $H_{0}$	72.1 ± 2.6	90.3 ± 1.6	54.1 ± 1.1	78.0 ± 2.3
Nested Ensemble (BF) $H_{0}$	79.0 ± 2.3	88.1 ± 1.1	52.1 ± 3.0	68.0 ± 3.3
Nested Ensemble (PL) $H_{0}$	78.2 ± 1.7	83.7 ± 2.1	52.2 ± 2.8	71.8 ± 2.9

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Abaa, I.; Islambekov, U. Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression. Int. J. Topol. 2025, 2, 17. https://doi.org/10.3390/ijt2040017

AMA Style

Abaa I, Islambekov U. Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression. International Journal of Topology. 2025; 2(4):17. https://doi.org/10.3390/ijt2040017

Chicago/Turabian Style

Abaa, Innocent, and Umar Islambekov. 2025. "Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression" International Journal of Topology 2, no. 4: 17. https://doi.org/10.3390/ijt2040017

APA Style

Abaa, I., & Islambekov, U. (2025). Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression. International Journal of Topology, 2(4), 17. https://doi.org/10.3390/ijt2040017

Article Menu

Nested Ensemble Learning with Topological Data Analysis for Graph Classification and Regression

Abstract

1. Introduction

2. Background and Preliminaries

2.1. TDA Background Theory

2.1.1. Homology and Persistent Homology

2.2. Distance Metrics and Stability of Persistence Diagrams

2.3. Vector Summary of Persistence Diagrams

3. Proposed Nested Ensemble Learning Framework

4. Experimental Setup

4.1. Simulation Studies

4.1.1. Experiment on Erdős–Rényi Graphs

4.1.2. Experiment on Stochastic Block Model Graphs

4.1.3. Experiment on Simulated Data—Regression Setting

4.2. Nested Ensemble on Real Data

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI