Next Article in Journal
The Realization of One-to-Two-Port Beam Division in a Five-Channel Acoustic System
Previous Article in Journal
Two Types of Geometric Jensen–Shannon Divergences
Previous Article in Special Issue
Integrating Human Mobility Models with Epidemic Modeling: A Framework for Generating Synthetic Temporal Contact Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Identifying Network Propagation Sources Using Advanced Centrality Measures

by
Damian Frąszczak
Institute of Information Systems, Faculty of Cybernetics, Military University of Technology, 00-908 Warsaw, Poland
Entropy 2025, 27(9), 948; https://doi.org/10.3390/e27090948
Submission received: 10 May 2025 / Revised: 21 August 2025 / Accepted: 30 August 2025 / Published: 12 September 2025
(This article belongs to the Special Issue Spreading Dynamics in Complex Networks)

Abstract

We live in a time dominated by interconnected networks surrounding us on all fronts. The emergence of social media platforms has driven the expansion of social networks, facilitating fast communication worldwide. Responses to content shared on these platforms can be seen as a propagation process, where information spreads through social networks. Analyzing propagation graphs presents a significant challenge in identifying sources, which is crucial in various fields. This includes detecting the origins of disinformation, identifying patient zero in an epidemic, and tracing the initial sources of viral trends or malware. Numerous studies have attempted to identify these sources using methods similar to centrality measures which assign a value indicating the likelihood of being a source. While centrality measures are a popular topic, with many new measures introduced each year, only a few have been explored in the context of source identification. This article explores a wide range of centrality measures in the context of source identification. The results help identify the most effective measures and pave the way for the development of more efficient detection techniques. Additionally, an analysis was conducted considering multiple hops in the propagation network, providing deeper insights into the impact of extended neighborhood structures on detection performance.

1. Introduction and Research Motivation

We live in an era where networks are ubiquitous, influencing many aspects of our daily lives. The emergence and widespread adoption of social media platforms have significantly accelerated the growth of social networks, allowing instant communication and content sharing across the globe. As information spreads quickly through these networks, understanding how it spreads is very important [1,2,3,4]. Reactions to content published on social media can be understood as a form of propagation, where information spreads through interconnected nodes in the network. The problem of identifying the source, often referred to as source identification, has recently gained more attention as a strategy for effectively controlling the spread of information [5,6,7].
Identifying the source of propagation in networks is a complex challenge with significant implications, such as tracing fake news, locating “patient zero” in disease outbreaks, and uncovering the origin of viral trends and malware. Despite its importance, source identification remains a challenging task [8,9]. Various methods, including Maximum Likelihood and Maximum A Posteriori estimators, Belief Propagation, Monte Carlo simulations, and additional approaches, have been developed to address this issue [5,6,8,10,11,12,13,14]. However, only a few centrality measures have been adequately explored in the context of source identification [15,16].
The issue of source detection is well-studied in the literature [6,11,12]. Works [15,16,17] present results based on well-known node centrality measures. Additionally, several studies introduce new metrics specifically designed for the source identification problem. For example, targeted betweenness centrality [18] is calculated within subgraphs identified using the Louvain method. Work [19] proposes the distance center, defined as the sum of the shortest paths from a node to all others, with the node with the smallest value considered the source. Ref. [20] introduces the unbiased betweenness centrality, computed as standard betweenness divided by the node degree.
Although various centrality-based approaches have been used for source detection [15,16,17,18,19,20,21], it should be emphasized that only a few popular ones have been widely applied. Some other techniques may have been proposed, but to the best of our knowledge, this is the first work to systematically evaluate as many as 25 different centrality measures in the context of source identification. The performance of these methods was compared with established rumor source detection techniques, such as RumorCenter [22], Netsleuth [23], and JordanCenter [24], across real and synthetic network structures using propagation simulations based on the SIR epidemiological model and the Independent Cascade model. The analysis employed traditional metrics, including the confusion matrix, precision, recall, F1 score, and source detection metrics such as average distance. The findings help identify effective centrality measures for source detection and guide future research by leveraging node characteristics considered by the best centrality measures.
In summary, the key aspects of the following research are as follows:
  • A thorough analysis of advanced centrality measures for identifying the source of network propagation.
  • For each simulation model, 96 unique propagation schemes were generated. Combining the two simulation models (SIR and IC) produced 192 propagation graphs. When these were paired with 25 tested methods, the total reached 4800 experiments.
  • Evaluating source detection effectiveness through centrality measures, enhanced by including suspected sets within one- and two-hop neighborhoods, illustrates the potential of centrality for pinpointing areas suitable for more focused source detection.
  • Experiments conducted on networks with various topologies confirmed the methods regarding both computational efficiency and detection performance.
  • The study utilizes publicly available tools like NDLib [25] and NetCenLib [26], which confirm their practical value and encourage broader adoption in the research community.
This paper consists of six sections. The first section provides basic information and research motivation. The second section introduces social networks, the propagation process, source identification, and centrality measures. The third section outlines the techniques used in this study, while the fourth section details the simulation conditions. The fifth section presents the examinations conducted and the results obtained. Finally, the last section concludes the paper, summarizing key issues and suggesting future development directions.

2. Social Networks, Propagation, Source Identification, and Centrality Measures

The network is represented by an undirected graph G = ( V , E ) , where nodes V (users/computers) are connected by edges E that denote relationships between them [6,10]. Spreading objects can initiate from one or more source nodes, which are known as sources v G and whose positions significantly influence the speed of propagation [4,6,10,14]. Source nodes share information with neighboring nodes, encouraging their participation in the spread. Over time, as more nodes become involved, an infection/propagation G I graph forms. This graph, a subgraph G , consists of the infected nodes that have participated in the spread of information through their connections (edges). Source detection methods aim to identify these source nodes based on observed patterns. This paper applies the Maximum Likelihood (ML) approach, assuming no prior information about the source nodes is available. In such a case, all nodes are considered equally likely to be the origin of the spread. This reflects a realistic scenario in many real-world networks where the infection or information source is unknown. In this scenario G I plays a crucial role in the estimation process, as it contains indirect traces of the propagation dynamics. Even though no prior probabilities are assigned to individual nodes, the topology of this induced graph provides structural cues that source detection methods attempt to leverage to identify the most likely source. The corresponding optimization task consists of determining the nodes with the highest likelihood of being the origin of the spread:
v ^ = arg max v G I P ( G I | v * = v )
The P ( G I | v * = v ) part is responsible for node evaluation and determining how probable it appears to be a source. One option for evaluating nodes is to use centrality measures. Centrality measures [2,27,28] evaluate specific characteristics of network vertices and determine which vertex is the most important based on a particular measure. The calculated values are usually normalized to a range of [ 0 , 1 ] , making comparing them across different networks and identifying similarities easier. In essence, a centrality measure is a function C : V 0 ,   1 that provides these assessments. More details about the centrality measures used in this study will be provided further in this paper.

3. Research Background

To study the effectiveness of source identification techniques in network structures, it is essential to use propagation graph examples with labeled true sources. While the literature [29,30,31] provides some examples of such data, they are often limited by their focus on reactions to specific messages (e.g., tweets), their tree-like structure, small node counts, or a lack of information about the surrounding environment (e.g., the neighborhood) and finally propagation graph. To address these limitations, this research incorporates simulations using the SIR epidemiological model, a widely recognized approach for analyzing the effectiveness of such methods [4,6,8,10], as well as the opinion-based Independent Cascade (IC) model [32,33,34]. These models capture various aspects of disinformation spread on social networks. The SIR model reflects the general dynamics of information transmission and recovery, while the IC model simulates user-to-user influence based on individual activation [6,7,10].
Centrality measures and source identification algorithms have been validated on eight datasets, including real-world and synthetic networks. The dataset configurations are adapted from existing studies [10,35,36]. As summarized in Table 1, half of these datasets are derived from real-world networks, while the rest are synthetic, representing distinct properties of network structures. Real-world networks (Dolphin, Football, Facebook, and Social) reflect actual conditions to some extent, whereas synthetic ones (SC—scale-free and WS—small-world networks) help simulate various network properties. Moreover, the networks were selected to ensure the feasibility of loading and processing on the available hardware resources, while still reflecting diverse structural properties.
The propagation graphs were generated using a simulation-based approach with two diffusion models: the epidemiological SIR model and the IC model. Both models were applied under consistent conditions specific to each model, allowing for a fair comparison. The SIR model was adapted to emulate behaviors on real-world social media platforms, using an infection probability of 0.1 and a recovery probability of 0.05, as per [37], which reflects the realistic dynamics of COVID-19-related rumor propagation [37]. At the same time, the IC model adopted the simulation conditions presented in [34] with a transmission probability set to 0.4. Simulations were conducted across various networks, with 12 independent runs for each network–model combination, using initial infection sources set to 0.01%, 0.1%, 1%, and 10% of the nodes, giving 96 different propagation schemes per simulation model. In the case of smaller networks, when the calculated number of initially infected source nodes for different infection levels (e.g., 0.01% vs. 0.1%) resulted in the same value due to rounding, a fallback mechanism was applied. Specifically, the number of source nodes was set to the greater of 2 or the ceiling of the product of the total number of nodes and the fallback infection ratio. Each simulation aimed to reach at least 40% infection coverage. This value reached up to 80% in dense networks, while it ranged between 40% and 80% in sparser ones. In each run, source nodes were randomly selected with a uniform distribution, and the process ended once the target infection size was reached.
The problem of identifying sources of propagation can be formulated as a binary classification task, where nodes are classified as either source or non-source; thereby, standard evaluation metrics based on the confusion matrix are commonly applied to source detection problems. The confusion matrix and its meaning are presented in Table 2.
Based on values obtained in the confusion matrix, the following classification metrics are computed:
  • Precision (PPV)
PPV = T P T P + F P
  • Recall (TPR)
TPR = T P T P + F N
  • F1
F 1 = 2 P P V T P R P P V + T P R
Moreover, specific metrics for source detection evaluation are used, such as the following:
  • Global Average Distance Error (GADE, hops)—the average shortest distance between a real source(s), v , and the estimated ones, v * .
G A D E = v ^ i = 1 l ( i , j ) j v * v ^ + v * \ v ^ i = 1 l ( i , j ) j v ^ v ^
where l ( i , j ) is the shortest path length between nodes i and j .
  • Average Detection Error (ADE)—the average ratio of the difference between the number of detected sources and the number of true sources.
ADE = N i = 1 A B S ( v ^ i v * i ) N
where N is the number of experiments.
The research was conducted on a system with an Intel Core i7-10700 CPU running at 2.90 GHz, 64 GB of RAM, and an SSD, operating on the Linux Ubuntu platform. The analysis utilized the RPaSDT [38] package, which integrates libraries such as NetCenLib [26] for centrality measures and NSDLib [25] for source detection methods. This toolkit facilitated the preparation of rumor propagation experiments on various network topologies using NDLib [39] and enabled the analysis of source detection methods with widely recognized diffusion models. Although a Docker-based runtime environment was available, we utilized the standalone package designed for the Linux platform to reduce unnecessary load.

4. A Review of Centrality Metrics Used in the Research

In the study, implementations of available centrality measures from the NetCenLib [26] package were utilized, and the research employed only those measures applicable to undirected networks. A list of these measures is presented in Table 3.
  • Algebraic centrality [40] measures the absolute and relative changes in a graph’s algebraic connectivity when a vertex is deleted.
  • Average distance [41] centrality refers to the average length of the shortest paths from node u to all other nodes in the network. It represents the inverse of closeness centrality.
  • Barycenter [42] quantifies a node’s centrality by the inverse of the total distance from that node to all others in the network. This measure identifies nodes centrally located in terms of overall network distance, reflecting their accessibility.
  • Betweenness [43] measures the importance of a node by calculating the ratio of the shortest paths that pass through the node to the total number of shortest paths between all pairs of nodes in a network. It highlights nodes that frequently act as bridges along the shortest paths between other nodes, indicating their crucial role in the network’s connectivity.
  • Closeness [44] measures how quickly a node can access all other nodes in a network, calculated as the inverse of the total distance from a node to all other nodes. This centrality metric often determines how rapidly information can spread from a given node to the entire network, highlighting strategically positioned nodes for efficient communication.
  • ClusterRank [45] is a local ranking algorithm that evaluates a node’s influence by considering its direct connections, neighbors’ influence, and clustering coefficient. This approach enhances the evaluation by incorporating how closely interconnected a node’s neighborhood is. This can significantly improve the assessment of a node’s strategic importance in undirected networks compared to simple degree centrality or k-core decomposition methods.
  • Coreness centrality [46], based on the k-shell indices of a node’s neighbors, is a powerful indicator of a node’s ability to disseminate information across a network. This metric evaluates a node’s connections to central or core network members, reflecting its potential influence more effectively.
  • Current-flow betweenness [47] is the average amount of current flowing through a specific vertex, calculated across all possible pairs of source and target nodes within the network. This centrality measure averages the current flow for each node, indicating how much a node acts as a conduit for the flow between various pairs in the network. It is shown to be the same as random-walk betweenness [48,49].
  • Current-flow closeness [47] transforms the traditional closeness index into a measure based on electrical current. This alternative approach calculates the distance between vertices in a network by assessing the difference in their electrical potentials. This method provides a distinctive perspective on node centrality, reflecting the ease with which current flows through different network parts. It is equivalent to information centrality [50].
  • Decay [49,51] is a centrality measure that quantifies the importance of a vertex in a network based on its proximity to all other vertices, adjusted by a decay factor. Specifically, the decay centrality of a chosen vertex in a graph is calculated by weighting the closeness of this vertex to every other vertex by a decay factor, which diminishes the influence of distance.
  • Degree measures the number of direct connections a node has in a network, indicating its importance based on how many neighbors it is directly linked to. A node with a higher degree of centrality is often more influential, as it interacts with other nodes directly [52].
  • Diffusion degree [53] identifies the most influential nodes in a network by considering their direct connections and their ability to spread influence to neighbors. This measure captures the cumulative impact of a node and its neighbors during the diffusion process, reaching its maximum when all neighbors are successfully activated.
  • Eigenvector [54] measures the influence of a node in a network by considering not only its direct connections but also the importance of the nodes it is connected to. It is computed using the principal eigenvector of the adjacency matrix, where the largest eigenvalue provides the desired centrality measure, ensuring that all entries in the eigenvector are positive.
  • Geodesic K-path [27] measures the importance of a node by counting its neighbors within a geodesic path length of less than “k”.
  • Harmonic [55] is a variation of closeness centrality that handles disconnected networks using the harmonic mean of distances, which performs better than the arithmetic mean when infinite distances are present. It calculates the importance of a node as the denormalized inverse of the harmonic mean of all distances to other nodes, making it more suitable for disconnected or sparse networks.
  • Heatmap [56] combines local and global network information by comparing a node’s farness to the average farness of its neighboring nodes. A node with a smaller farness than its neighbors is considered more influential, as information is more likely to pass through it than through adjacent nodes. This makes heatmap centrality effective in identifying super-spreader nodes that control information flow within a network.
  • Leverage [57] measures a node’s influence by comparing its degree to the degrees of its neighbors, averaging the differences. A node with negative leverage centrality is influenced by its neighbors, as they are more connected. In contrast, a node with positive leverage centrality has more influence over its neighbors, who have fewer connections.
  • Lin [58] adjusts the concept of closeness by considering the average distance and the number of coreachable nodes. It is calculated by multiplying the inverse of the average distance by the square of the number of coreachable nodes, giving more importance to nodes that can reach a larger portion of the network. This modification ensures that nodes with larger coreachable sets are deemed more central, while nodes with no coreachable set have a centrality of 1 by definition.
  • Load [59] measures the significance of a node by determining the fraction of all shortest paths in the network that go through it. This reflects the node’s ability to manage flow within the network. Unlike betweenness centrality, load centrality evenly distributes flow among neighboring nodes at the shortest distance to the target. This makes it especially valuable for analyzing flow structures operating below their capacity limits.
  • MNC [60] evaluates the importance of a node by assessing the size of the largest connected group within its immediate neighbors, excluding the node itself. It measures how well-connected a node’s neighbors are, indicating the node’s influence based on the cohesion of its surrounding network.
  • PageRank [61] evaluates the relative importance of nodes in a network by analyzing the number and quality of incoming links. It is based on a modified random walk, where there is a probability of jumping to any node, ensuring the scores are distributed more evenly across the network. Nodes with more links from highly ranked nodes are considered more important, making PageRank an effective way to measure influence within a network.
  • Percolation [62] measures the importance of a node in spreading information or processes in dynamic networks, where nodes can transition between different states (e.g., infected or not). It calculates the proportion of shortest paths that pass through a node, considering its percolation state over time, making it useful for understanding the influence of spreading phenomena, such as diseases or information.
  • Radiality [63] measures how close a node is to all other nodes in the network relative to the network’s diameter. A node with high radiality centrality is generally closer to other nodes, indicating a more central position, while a low radiality suggests the node is more peripheral in the network.
  • Subgraph [64] measures how much a node participates in all the network subgraphs, accounting for all closed walks of various lengths that start and end at the node. It assigns a higher weight to shorter walks, meaning nodes involved in more local, tightly knit structures receive higher centrality. This centrality is calculated using the eigenvalues and eigenvectors of the network’s adjacency matrix.
  • Topological [65] evaluates how much a node shares its neighbors with other nodes in the network. It calculates the ratio of shared neighbors between a node and its connected nodes, plus one if they have a direct connection, divided by the total number of neighbors of the node. Nodes with few or no neighbors receive a coefficient of zero, indicating minimal shared connections.
The above analysis highlights the key characteristics of the studied centrality measures, and the obtained results facilitate the identification of critical factors for detecting propagation sources.

5. Results and Discussion

The research used simulations based on previously outlined conditions. The NetCenLib and NSDLib packages offer various centrality measures and techniques for identifying propagation sources. However, some methods, such as Hubbell centrality, are primarily designed for directed graphs, and others, like Rumor centrality, were too slow for this study, taking over 17 h to process the Social network under the lowest infection coverage. A maximum execution time of one hour was established, eliminating slower methods. Figure 1 illustrates the average execution times for the methods used, as summarized in Table 3. JordanCenter and NetSleuth are referred to as JC and NT, respectively. CFB and AL are among the slowest for both propagation methods, while NT, LD, PE, and BC are slower than average but faster than the slowest methods.
The methods used for the identification task were compared based on the F1 score and average error distance. The selected nodes matched the initial number of sources, focusing on those with the highest metric values.
Figure 2 and Figure 3 present the overall results for propagation graphs generated by both simulation models, with F1 scores sorted in descending order and average distance errors in ascending order. Although the AV technique produced the best average F1 score, the small difference (0.03) makes it difficult to identify a clear best technique.
The average error distance graph revealed that LD, BC, PE, CFB, LE, PR, HA, DC, CL, RA, LIN, DE, GM, MNC, CFV, and JC performed the best, with distances of about 1.5 hops from the source, while AV was the worst, exceeding 2.5 hops. Notably, AV had the highest F1 score but the worst average distance. This can be explained by its tendency to correctly identify some sources while frequently choosing central nodes far from the true origins.
Overall, achieving high accuracy in detecting real sources is a complex task, making it challenging to achieve high F1 scores. Since the average distance error is critical for evaluation, extending the detected sources to include neighbors could enhance results [8,10,12]. It can be observed that expanding the identified source nodes by including their immediate neighbors (one hop: Figures 4, 6, 8 and 10) or even their neighbors’ neighbors (two hops: Figures 5, 7, 9 and 11) improves the accuracy of detecting the real source nodes. The results reported in the paper are based on all experiment results with both propagation models (IC and SIR), while detailed results for specific cases are presented in the Appendix A and Appendix B.
In the TP measure (Figure 4 and Figure 5), more source nodes have been correctly identified due to a larger suspected set and increased hops. This is also reflected in the F1 scores (Figure 6 and Figure 7), which are higher than the original ones, with two-hop neighbors yielding similar results. However, there appears to be a limit around an F1 score of 0.11. The TPR (Figure 8 and Figure 9) metric shows significant improvement for both one-hop and two-hop nodes, and the PPV results (Figure 10 and Figure 11) decline when including two hops due to more false positives. The results suggest that using one-hop neighbors is optimal. Future improvements could go beyond the current practice of either selecting all identified nodes or only the one with the highest score [10,11,12]. Possible extensions include choosing the top-X nodes with similar or close obtained values, or even adopting approaches like [15], where an extra measure is used to identify crucial nodes. However, this optimization is beyond the scope of this paper.
Table 4 contains two subtables: comparing performance before (Table 4a) and after extending the analysis by one hop (Table 4b). The values are sorted by True Positive Rate (TPR), as this metric is critical in identifying as many sources as possible, which is essential in the propagation context. After the one-hop extension, there is a significant increase in TPR and True Positives (TP), with methods like LE and LD achieving TPR values of ~0.68, compared to ~0.05 in the initial setup. F1 scores also improved, with LD increasing from 0.05 to 0.101, but the differences in F1 are less pronounced than the gains in TPR. Interestingly, expanding the analysis led to changes in the rankings of methods, with LE becoming the top performer in TPR, while CFC was ranked higher in the original configuration. The best methods for TPR metrics are CFB and LD, and they leverage features related to the shortest paths passing through a node, highlighting that this characteristic is pivotal in identifying propagation sources.
F1 scores also improved, with LD increasing from 0.054 to 0.101, but the differences in F1 are less pronounced compared to the gains in TPR. Expanding the analysis led to changes in the ranking of methods, with LE and LD emerging as top performers in TPR, while AV was ranked higher in the original configuration. Notably, methods with a TPR above 0.65 consistently achieved relatively high F1 scores, demonstrating that better coverage correlates with a balanced precision and recall.
These results highlight that relying on a single centrality-based method may be insufficient for comprehensive source detection. The inclusion of one-hop extensions significantly improves detection but also introduces the possibility of refining the approach further. For instance, excluding boundary nodes or those with minimal likelihood of being sources could reduce false positives and improve precision. A key finding of this study is that features related to shortest paths play a crucial role in source identification. Future research should focus on integrating these extensions with selective node filtering to enhance both F1 scores and overall detection robustness.
The analysis of individual method evaluations in SIR and IC-based simulations, presented in Appendix A and Appendix B, respectively, led to interesting conclusions. Notably, the methods achieved higher metric values in the SIR model—for example, the highest F1 score was 0.16 compared to 0.105, and the TPR was 0.78 compared to 0.65. Moreover, the results show that centrality measures used for source identification yield different outcomes depending on the simulation model and network topology. It concludes that the choice of propagation model has a significant influence on the effectiveness of detection methods, underscoring the need to tailor techniques to specific spreading dynamics or network topologies.
The analysis of results by network type shows that in smaller networks, such as Dolphin (Figure 12) and Football (Figure 13), F1 scores exceed 0.175 due to the smaller number of source nodes. Similar results are observed in small-world synthetic networks (Figure 14 and Figure 15), where short distances between nodes facilitate easier tracking of information sources. In contrast, scale-free networks (Figure 16 and Figure 17) exhibit lower scores, often below 0.08, without expanding the set of suspected nodes. Larger social networks like Facebook (Figure 18) and Social (Figure 19) show a similar trend. However, expanding the suspected node set can double the scores in many cases. The better performance of small-world networks stems from shorter distances between nodes, allowing quicker identification of information sources. In scale-free networks, the presence of highly connected nodes (hubs) complicates source identification, leading to lower F1 scores. Interestingly, adding extra connections in small-world networks does not always improve outcomes, with initial source nodes often achieving higher F1 scores than extended ones. Real-world networks often display characteristics of both small-world and scale-free types. Moreover, denser networks, with more connections between nodes, improve source identification accuracy, as evidenced by lower average hop errors across methods in denser setups compared to sparser ones. This trend underscores the universal benefit of network density in improving source identification effectiveness.

6. Conclusions

This paper examines the utilization of various centrality measures to identify propagation sources in networks. The research was conducted on both real-world and generated social networks, with the propagation process simulated using two propagation models: the SIR and IC models. Each simulation began with randomly chosen source nodes, leading to a total of 96 propagation simulations for each network type and propagation model. Combining the two models (SIR and IC), this resulted in 192 propagation graphs, which, when evaluated with 25 different methods, summed up to a total of 4800 experiments. These simulations provided the propagation graphs needed to assess the various techniques used for source identification.
The analysis demonstrated that a diverse set of state-of-the-art centrality measures yielded better results for source identification than baseline methods specifically designed for this purpose, such as NETSLEUTH and the Jordan Center. This finding suggests that general centrality measures may be more effective in certain contexts than specialized methods. However, some more complex approaches, like RumorCenter, could not complete all tests within the one-hour time limit per propagation graph due to their computational complexity. This limitation is significant for real-world applications, where timely network analysis is crucial.
An extended analysis that considered multiple hops in the propagation network revealed a notable improvement in source identification performance. The True Positive Rate (TPR) significantly increased, especially for centrality measures based on shortest paths, such as LD and CFB, which proved to be the most effective techniques. These results underscore the importance of incorporating extended neighborhood structures in source detection. Furthermore, the correlation between high TPR and balanced F1 scores implies that methods leveraging shortest-path-related features are particularly well-suited for this task. This finding opens avenues for future enhancements in detection techniques, such as filtering out boundary nodes or nodes with a low likelihood of being sources, to enhance the results. Interestingly, the methods achieved varying results depending on the propagation graphs generated by different diffusion models. This leads to the conclusion that source detection methods should be selected according to the expected propagation dynamics. Although the overall F1 scores achieved were not entirely satisfactory, the study successfully identified the most effective techniques, providing a foundation for future improvements in source detection accuracy.
One potential direction for future work involves dividing the network into smaller subgroups centered around key propagation hubs, allowing for localized source identification. This approach, often referred to as propagation outbreak detection [14], can be resolved by community detection methods like Louvain, Leiden, or BLOCD [66,67,68]. By narrowing the detection scope to smaller, structurally coherent regions, it may be possible to improve accuracy while reducing computational complexity [69]. Another approach to leverage the findings is to apply effective centrality measures to identify candidate nodes within a one-hop neighborhood of the infected region. These nodes can serve as initial indicators of potential source locations. In the next step, a more detailed analysis—such as collecting timestamp data or applying computationally intensive methods—can focus only on this reduced area. This two-stage process can significantly lower computational costs while maintaining or even improving detection accuracy. Finally, the flexibility and openness of the developed tools, such as NetCenLib and NSDLib, provide a solid foundation for further experimentation and extension, using new centrality methods like Biharmonic distance [70] and others [49]. These libraries can be expanded with new detection algorithms, alternative centrality metrics, or domain-specific heuristics to support both academic research and practical applications in network monitoring and rumor source containment.

Funding

This work was financed by the Military University of Technology, under the research project UGB 531-000023-W500-22/2025.

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Conflicts of Interest

The author declares no conflict of interest.

Appendix A. Results for SIR-Based Simulation

Figure A1. Global F1 score per measure in the SIR-based simulation.
Figure A1. Global F1 score per measure in the SIR-based simulation.
Entropy 27 00948 g0a1
Figure A2. Global average distance per measure in the SIR-based simulation.
Figure A2. Global average distance per measure in the SIR-based simulation.
Entropy 27 00948 g0a2
Figure A3. Average TP score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Figure A3. Average TP score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Entropy 27 00948 g0a3
Figure A4. Average TP score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Figure A4. Average TP score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Entropy 27 00948 g0a4
Figure A5. Average F1 score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Figure A5. Average F1 score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Entropy 27 00948 g0a5
Figure A6. Average F1 score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Figure A6. Average F1 score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Entropy 27 00948 g0a6
Figure A7. Average TPR score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Figure A7. Average TPR score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Entropy 27 00948 g0a7
Figure A8. Average TPR score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Figure A8. Average TPR score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Entropy 27 00948 g0a8
Figure A9. Average PPV score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Figure A9. Average PPV score before and after expanding detected nodes by one hop per method in the SIR-based simulation.
Entropy 27 00948 g0a9
Figure A10. Average PPV score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Figure A10. Average PPV score before and after expanding detected nodes by two hops per method in the SIR-based simulation.
Entropy 27 00948 g0a10
Figure A11. Average F1 score before and after expanding detected nodes by one hop for the Dolphin network in the SIR-based simulation.
Figure A11. Average F1 score before and after expanding detected nodes by one hop for the Dolphin network in the SIR-based simulation.
Entropy 27 00948 g0a11
Figure A12. Average F1 score before and after expanding detected nodes by one hop for the Football network in the SIR-based simulation.
Figure A12. Average F1 score before and after expanding detected nodes by one hop for the Football network in the SIR-based simulation.
Entropy 27 00948 g0a12
Figure A13. Average F1 score before and after expanding detected nodes by one hop for the SW-1 network in the SIR-based simulation.
Figure A13. Average F1 score before and after expanding detected nodes by one hop for the SW-1 network in the SIR-based simulation.
Entropy 27 00948 g0a13
Figure A14. Average F1 score before and after expanding detected nodes by one hop for the SW-2 network in the SIR-based simulation.
Figure A14. Average F1 score before and after expanding detected nodes by one hop for the SW-2 network in the SIR-based simulation.
Entropy 27 00948 g0a14
Figure A15. Average F1 score before and after expanding detected nodes by one hop for the SF-1 network in the SIR-based simulation.
Figure A15. Average F1 score before and after expanding detected nodes by one hop for the SF-1 network in the SIR-based simulation.
Entropy 27 00948 g0a15
Figure A16. Average F1 score before and after expanding detected nodes by one hop for the SF-2 network in the SIR-based simulation.
Figure A16. Average F1 score before and after expanding detected nodes by one hop for the SF-2 network in the SIR-based simulation.
Entropy 27 00948 g0a16
Figure A17. Average F1 score before and after expanding detected nodes by one hop for the Facebook network in the SIR-based simulation.
Figure A17. Average F1 score before and after expanding detected nodes by one hop for the Facebook network in the SIR-based simulation.
Entropy 27 00948 g0a17
Figure A18. Average F1 score before and after expanding detected nodes by one hop for the Social network in the SIR-based simulation.
Figure A18. Average F1 score before and after expanding detected nodes by one hop for the Social network in the SIR-based simulation.
Entropy 27 00948 g0a18

Appendix B. Results for IC-Based Simulation

Figure A19. Global F1 score per measure in the IC-based simulation.
Figure A19. Global F1 score per measure in the IC-based simulation.
Entropy 27 00948 g0a19
Figure A20. Global average distance per measure in the IC-based simulation.
Figure A20. Global average distance per measure in the IC-based simulation.
Entropy 27 00948 g0a20
Figure A21. Average TP score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Figure A21. Average TP score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Entropy 27 00948 g0a21
Figure A22. Average TP score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Figure A22. Average TP score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Entropy 27 00948 g0a22
Figure A23. Average F1 score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Figure A23. Average F1 score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Entropy 27 00948 g0a23
Figure A24. Average F1 score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Figure A24. Average F1 score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Entropy 27 00948 g0a24
Figure A25. Average TPR score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Figure A25. Average TPR score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Entropy 27 00948 g0a25
Figure A26. Average TPR score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Figure A26. Average TPR score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Entropy 27 00948 g0a26
Figure A27. Average PPV score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Figure A27. Average PPV score before and after expanding detected nodes by one hop per method in the IC-based simulation.
Entropy 27 00948 g0a27
Figure A28. Average PPV score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Figure A28. Average PPV score before and after expanding detected nodes by two hops per method in the IC-based simulation.
Entropy 27 00948 g0a28
Figure A29. Average F1 score before and after expanding detected nodes by one hop for the Dolphin network in the IC-based simulation.
Figure A29. Average F1 score before and after expanding detected nodes by one hop for the Dolphin network in the IC-based simulation.
Entropy 27 00948 g0a29
Figure A30. Average F1 score before and after expanding detected nodes by one hop for the Football network in the IC-based simulation.
Figure A30. Average F1 score before and after expanding detected nodes by one hop for the Football network in the IC-based simulation.
Entropy 27 00948 g0a30
Figure A31. Average F1 score before and after expanding detected nodes by one hop for the SW-1 network in the IC-based simulation.
Figure A31. Average F1 score before and after expanding detected nodes by one hop for the SW-1 network in the IC-based simulation.
Entropy 27 00948 g0a31
Figure A32. Average F1 score before and after expanding detected nodes by one hop for the SW-2 network in the IC-based simulation.
Figure A32. Average F1 score before and after expanding detected nodes by one hop for the SW-2 network in the IC-based simulation.
Entropy 27 00948 g0a32
Figure A33. Average F1 score before and after expanding detected nodes by one hop for the SF-1 network in the IC-based simulation.
Figure A33. Average F1 score before and after expanding detected nodes by one hop for the SF-1 network in the IC-based simulation.
Entropy 27 00948 g0a33
Figure A34. Average F1 score before and after expanding detected nodes by one hop for the SF-2 network in the IC-based simulation.
Figure A34. Average F1 score before and after expanding detected nodes by one hop for the SF-2 network in the IC-based simulation.
Entropy 27 00948 g0a34
Figure A35. Average F1 score before and after expanding detected nodes by one hop for the Facebook network in the IC-based simulation.
Figure A35. Average F1 score before and after expanding detected nodes by one hop for the Facebook network in the IC-based simulation.
Entropy 27 00948 g0a35
Figure A36. Average F1 score before and after expanding detected nodes by one hop for the Social network in the IC-based simulation.
Figure A36. Average F1 score before and after expanding detected nodes by one hop for the Social network in the IC-based simulation.
Entropy 27 00948 g0a36

References

  1. Xiao, H.-B.; Hu, F.; Li, P.-Y.; Song, Y.-R.; Zhang, Z.-K. Information Propagation in Hypergraph-Based Social Networks. Entropy 2024, 26, 957. [Google Scholar] [CrossRef]
  2. Dey, P.; Bhattacharya, S.; Roy, S. A Survey on the Role of Centrality as Seed Nodes for Information Propagation in Large Scale Network. ACMIMS Trans. Data Sci. 2021, 2, 24. [Google Scholar] [CrossRef]
  3. Frąszczak, D. Information Propagation in Online Social Networks—A Simulation Case Study. In Proceedings of the 38th International Business Information Management Association (IBIMA), Seville, Spain, 23–24 November 2021; International Business Information Management: King of Prussia, PA, USA, 2021. [Google Scholar] [CrossRef]
  4. Jiang, J.; Wen, S.; Yu, S.; Xiang, Y.; Zhou, W. Identifying Propagation Sources in Networks: State-of-the-Art and Comparative Studies. IEEE Commun. Surv. Tutor. 2017, 19, 465–481. [Google Scholar] [CrossRef]
  5. Meel, P.; Vishwakarma, D.K. Fake news, rumor, information pollution in social media and web: A contemporary survey of state-of-the-arts, challenges and opportunities. Expert. Syst. Appl. 2020, 153, 112986. [Google Scholar] [CrossRef]
  6. Frąszczak, D. Fake News Source Detection—The State of The Art Survey for Current Problems and Research. In Proceedings of the 37th International Business Information Management Association (IBIMA), Cordoba, Spain, 30–31 May 2021; International Business Information Management: King of Prussia, PA, USA, 2021. [Google Scholar] [CrossRef]
  7. Zehmakan, A.N.; Out, C.; Khelejan, S.H. Why Rumors Spread Fast in Social Networks, and How to Stop It. arXiv 2023, arXiv:2305.08558. [Google Scholar] [CrossRef]
  8. Jin, R.; Wu, W. Schemes of Propagation Models and Source Estimators for Rumor Source Detection in Online Social Networks: A Short Survey of a Decade of Research. Discret. Math. Algorithms Appl. 2021, 12, 2130002. [Google Scholar] [CrossRef]
  9. Yu, Z.; Lu, S.; Wang, D.; Li, Z. Modeling and analysis of rumor propagation in social networks. Inf. Sci. 2021, 580, 857–873. [Google Scholar] [CrossRef]
  10. Shelke, S.; Attar, V. Source detection of rumor in social network—A review. Online Soc. Netw. Media 2019, 9, 30–42. [Google Scholar] [CrossRef]
  11. Liu, Y.; Shen, H.; Shi, L. A review of rumor detection techniques in social networks. J. Intell. Fuzzy Syst. 2022, 44, 3561–3578. [Google Scholar] [CrossRef]
  12. Aïmeur, E.; Amri, S.; Brassard, G. Fake news, disinformation and misinformation in social media: A review. Soc. Netw. Anal. Min. 2023, 13, 30. [Google Scholar] [CrossRef]
  13. Jiang, Y.; Wang, R.; Sun, J.; Wang, Y.; You, H.; Zhang, Y.; Localization, R. Detection and Prediction in Social Network. IEEE Trans. Comput. Soc. Syst. 2022, 11, 3168–3178. [Google Scholar] [CrossRef]
  14. Frąszczak, D. Detecting rumor outbreaks in online social networks. Soc. Netw. Anal. Min. 2023, 13, 91. [Google Scholar] [CrossRef] [PubMed]
  15. Ali, S.S.; Anwar, T.; Rizvi, S.A.M. A Revisit to the Infection Source Identification Problem under Classical Graph Centrality Measures. Online Soc. Netw. Media 2020, 17, 100061. [Google Scholar] [CrossRef]
  16. Das, A.; Biswas, A. Rumor Source Identification on Social Networks: A Combined Network Centrality Approach. In Progress in Advanced Computing and Intelligent Engineering; Advances in Intelligent Systems and Computing; Panigrahi, C.R., Pati, B., Pattanayak, B.K., Amic, S., Li, K.-C., Eds.; Springer: Singapore, 2021; Volume 1299, pp. 269–280. [Google Scholar] [CrossRef]
  17. Das, K.; Sinha, S.K. Centrality measure based approach for detection of malicious nodes in twitter social network. Int. J. Eng. Technol. 2018, 7, 518. [Google Scholar] [CrossRef]
  18. Britt, B.C.; Hayes, J.L.; Musaev, A.; Sheinidashtegol, P.; Parrott, S.; Albright, D.L. Using targeted betweenness centrality to identify bridges to neglected users in the Twitter conversation on veteran suicide. Soc. Netw. Anal. Min. 2021, 11, 40. [Google Scholar] [CrossRef]
  19. Shah, D.; Zaman, T. Detecting sources of computer viruses in networks: Theory and experiment. In Proceedings of the ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems—SIGMETRICS ’10, New York, NY, USA, 14–18 June 2010; ACM Press: New York, NY, USA, 2010; p. 203. [Google Scholar] [CrossRef]
  20. Comin, C.H.; Costa, L.d.F. Identifying the starting point of a spreading process in complex networks. Phys. Rev. E 2011, 84, 056105. [Google Scholar] [CrossRef]
  21. Du, Y.; Gao, C.; Chen, X.; Hu, Y.; Sadiq, R.; Deng, Y. A new closeness centrality measure via effective distance in complex networks. Chaos Interdiscip. J. Nonlinear Sci. 2015, 25, 033112. [Google Scholar] [CrossRef]
  22. Shah, D.; Zaman, T. Rumors in a Network: Who’s the Culprit? IEEE Trans. Inf. Theory 2011, 57, 5163–5181. [Google Scholar] [CrossRef]
  23. Prakash, B.A.; Vreeken, J.; Faloutsos, C. Spotting Culprits in Epidemics: How Many and Which Ones? In Proceedings of the 2012 IEEE 12th International Conference on Data Mining, Brussels, Belgium, 10–13 December 2012; IEEE: Piscataway, NJ, USA, 2012; pp. 11–20. [Google Scholar] [CrossRef]
  24. Ying, L.; Zhu, K. Diffusion Source Localization in Large Networks. In Synthesis Lectures on Learning, Networks, and Algorithms; Springer International Publishing: Cham, Switzerland, 2018. [Google Scholar] [CrossRef]
  25. Frąszczak, D.; Frąszczak, E. NSDLib: A comprehensive python library for network source detection and evaluation. SoftwareX 2024, 28, 101950. [Google Scholar] [CrossRef]
  26. Frąszczak, D.; Frąszczak, E. NetCenLib: A comprehensive python library for network centrality analysis and evaluation. SoftwareX 2024, 26, 101699. [Google Scholar] [CrossRef]
  27. Borgatti, S.P.; Everett, M.G. A Graph-theoretic perspective on centrality. Soc. Netw. 2006, 28, 466–484. [Google Scholar] [CrossRef]
  28. Das, K.; Samanta, S.; Pal, M. Study on centrality measures in social networks: A survey. Soc. Netw. Anal. Min. 2018, 8, 13. [Google Scholar] [CrossRef]
  29. Cui, L.; Lee, D. CoAID: COVID-19 Healthcare Misinformation Dataset. arXiv 2020, arXiv:2006.00885. [Google Scholar] [CrossRef]
  30. Kochkina, E.; Liakata, M.; Zubiaga, A. PHEME dataset for Rumour Detection and Veracity Classification’. Figshare. Dataset 2018, 10, 46531457. [Google Scholar] [CrossRef]
  31. Shu, K.; Mahudeswaran, D.; Wang, S.; Lee, D.; Liu, H. FakeNewsNet: A Data Repository with News Content, Social Context and Spatialtemporal Information for Studying Fake News on Social Media. Big Data 2021, 8, 171–188. [Google Scholar] [CrossRef]
  32. Shakarian, P.; Bhatnagar, A.; Aleali, A.; Shaabani, E.; Guo, R. The Independent Cascade and Linear Threshold Models. In Diffusion in Social Networks; Springer Briefs in Computer Science; Springer International Publishing: Cham, Switzerland, 2015; pp. 35–48. [Google Scholar] [CrossRef]
  33. Peralta, A.F.; Kertész, J.; Iñiguez, G. Opinion dynamics in social networks: From models to data. arXiv 2022, arXiv:2201.01322. [Google Scholar] [CrossRef]
  34. Gray, C.; Mitchell, L.; Roughan, M. Bayesian inference of network structure from information cascades. arXiv 2019, arXiv:1908.03318. [Google Scholar] [CrossRef]
  35. Li, Q.; Zhang, Q.; Si, L.; Liu, Y. Rumor Detection on Social Media: Datasets, Methods and Opportunities. In Proceedings of the Second Workshop on Natural Language Processing for Internet Freedom: Censorship, Disinformation, and Propaganda, Hong Kong, China, 4 November 2019; Association for Computational Linguistics: Stroudsburg, PA, USA, 2019; pp. 66–75. [Google Scholar] [CrossRef]
  36. Murayama, T. Dataset of Fake News Detection and Fact Verification: A Survey. arXiv 2021, arXiv:2111.03299. [Google Scholar] [CrossRef]
  37. Ju, C.; Jiang, Y.; Bao, F.; Zou, B.; Xu, C. Online Rumor Diffusion Model Based on Variation and Silence Phenomenon in the Context of COVID-19. Front. Public Health 2022, 9, 788475. [Google Scholar] [CrossRef]
  38. Frąszczak, D. RPaSDT—Rumor Propagation and Source Detection Toolkit. SoftwareX 2022, 17, 100988. [Google Scholar] [CrossRef]
  39. Rossetti, G.; Milli, L.; Rinzivillo, S.; Sîrbu, A.; Pedreschi, D.; Giannotti, F. NDlib: A python library to model and analyze diffusion processes over complex networks. Int. J. Data Sci. Anal. 2018, 5, 61–79. [Google Scholar] [CrossRef]
  40. Kirkland, S. Algebraic connectivity for vertex-deleted subgraphs, and a notion of vertex centrality. Discret. Math. 2010, 310, 911–921. [Google Scholar] [CrossRef][Green Version]
  41. Del Rio, G.; Koschützki, D.; Coello, G. How to identify essential genes from molecular networks? BMC Syst. Biol. 2009, 3, 102. [Google Scholar] [CrossRef]
  42. Viswanath, M. Ontology-Based Automatic Text Summarization. Master’s Thesis, University of Georgia, Athens, GA, USA, 2009. [Google Scholar]
  43. Brandes, U. A faster algorithm for betweenness centrality*. J. Math. Sociol. 2001, 25, 163–177. [Google Scholar] [CrossRef]
  44. Latora, V.; Marchiori, M. Efficient Behavior of Small-World Networks. Phys. Rev. Lett. 2001, 87, 198701. [Google Scholar] [CrossRef]
  45. Chen, D.-B.; Gao, H.; Lü, L.; Zhou, T. Identifying Influential Nodes in Large-Scale Directed Networks: The Role of Clustering. PLoS ONE 2013, 8, e77455. [Google Scholar] [CrossRef]
  46. Bae, J.; Kim, S. Identifying and ranking influential spreaders in complex networks by neighborhood coreness. Phys. Stat. Mech. Its Appl. 2014, 395, 549–559. [Google Scholar] [CrossRef]
  47. Brandes, U.; Fleischer, D. Centrality Measures Based on Current Flow. In Annual Symposium on Theoretical Aspects of Computer Science; Lecture Notes in Computer Science; Diekert, V., Durand, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2005; Volume 3404, pp. 533–544. [Google Scholar] [CrossRef]
  48. Newman, M.E.J. A measure of betweenness centrality based on random walks. Soc. Netw. 2005, 27, 39–54. [Google Scholar] [CrossRef]
  49. Jalili, M.; Salehzadeh-Yazdi, A.; Asgari, Y.; Arab, S.S.; Yaghmaie, M.; Ghavamzadeh, A.; Alimoghaddam, K.; Li, T. CentiServer: A Comprehensive Resource, Web-Based Application and R Package for Centrality Analysis. PLoS ONE 2015, 10, e0143111. [Google Scholar] [CrossRef]
  50. Stephenson, K.; Zelen, M. Rethinking centrality: Methods and examples. Soc. Netw. 1989, 11, 1–37. [Google Scholar] [CrossRef]
  51. Jackson, M.O. Social and Economic Networks; Princeton University Press: Princeton, NJ, USA, 2010. [Google Scholar] [CrossRef]
  52. Chebotarev, P.; Gubanov, D. How to choose the most appropriate centrality measure? arXiv 2020, arXiv:2003.01052. [Google Scholar] [CrossRef]
  53. Kundu, S.; Murthy, C.A.; Pal, S.K. A New Centrality Measure for Influence Maximization in Social Networks. In Pattern Recognition and Machine Intelligence; Lecture Notes in Computer Science; Kuznetsov, S.O., Mandal, D.P., Kundu, M.K., Pal, S.K., Eds.; Springer: Berlin/Heidelberg, Germany, 2011; Volume 6744, pp. 242–247. [Google Scholar] [CrossRef]
  54. Rodriguez, J.A.; Estrada, E.; Gutierrez, A. Functional centrality in graphs. arXiv 2006, arXiv:math/0610141. [Google Scholar] [CrossRef]
  55. Opsahl, T.; Agneessens, F.; Skvoretz, J. Node centrality in weighted networks: Generalizing degree and shortest paths. Soc. Netw. 2010, 32, 245–251. [Google Scholar] [CrossRef]
  56. Durón, C. Heatmap centrality: A new measure to identify super-spreader nodes in scale-free networks. PLoS ONE 2020, 15, e0235690. [Google Scholar] [CrossRef] [PubMed]
  57. Joyce, K.E.; Laurienti, P.J.; Burdette, J.H.; Hayasaka, S. A New Measure of Centrality for Brain Networks. PLoS ONE 2010, 5, e12200. [Google Scholar] [CrossRef] [PubMed]
  58. Lin, N. Foundations of Social Research; McGraw-Hill: New York, NY, USA, 1976. [Google Scholar]
  59. Brandes, U. On variants of shortest-path betweenness centrality and their generic computation. Soc. Netw. 2008, 30, 136–145. [Google Scholar] [CrossRef]
  60. Lin, C.-Y.; Chin, C.-H.; Wu, H.-H.; Chen, S.-H.; Ho, C.-W.; Ko, M.-T. Hubba: Hub objects analyzer—A framework of interactome hubs identification for network biology. Nucleic Acids Res. 2008, 36 (Suppl. S2), W438–W443. [Google Scholar] [CrossRef]
  61. Brin, S.; Page, L. The anatomy of a large-scale hypertextual Web search engine. Comput. Netw. ISDN Syst. 1998, 30, 107–117. [Google Scholar] [CrossRef]
  62. Piraveenan, M.; Prokopenko, M.; Hossain, L. Percolation Centrality: Quantifying Graph-Theoretic Impact of Nodes during Percolation in Networks. PLoS ONE 2013, 8, e53095. [Google Scholar] [CrossRef]
  63. Valente, T.W.; Foreman, R.K. Integration and radiality: Measuring the extent of an individual’s connectedness and reachability in a network. Soc. Netw. 1998, 20, 89–105. [Google Scholar] [CrossRef]
  64. Estrada, E.; Rodriguez-Velazquez, J.A. Subgraph Centrality in Complex Networks. arXiv 2005, arXiv:cond-mat/0504730v1. [Google Scholar] [CrossRef]
  65. Assenov, Y.; Ramírez, F.; Schelhorn, S.-E.; Lengauer, T.; Albrecht, M. Computing topological parameters of biological networks. Bioinformatics 2008, 24, 282–284. [Google Scholar] [CrossRef]
  66. Frąszczak, D. Leadership-oriented community detection: Enhancing accuracy in social network analysis. Inf. Sci. 2025, 718, 122421. [Google Scholar] [CrossRef]
  67. Held, P.; Krause, B.; Kruse, R. Dynamic Clustering in Social Networks Using Louvain and Infomap Method. In Proceedings of the 2016 Third European Network Intelligence Conference (ENIC), Wroclaw, Poland, 5–7 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 61–68. [Google Scholar] [CrossRef]
  68. Traag, V.; Waltman, L.; van Eck, N.J. From Louvain to Leiden: Guaranteeing well-connected communities. Sci. Rep. 2019, 9, 5233. [Google Scholar] [CrossRef]
  69. Rajeh, S.; Cherifi, H. Ranking influential nodes in complex networks with community structure. PLoS ONE 2022, 17, e0273610. [Google Scholar] [CrossRef]
  70. Liu, C.; Zehmakan, A.N.; Zhang, Z. Fast Query of Biharmonic Distance in Networks. arXiv 2024, arXiv:2408.13538. [Google Scholar] [CrossRef]
Figure 1. Average execution time per metric in different propagation models.
Figure 1. Average execution time per metric in different propagation models.
Entropy 27 00948 g001
Figure 2. Global F1 score per measure.
Figure 2. Global F1 score per measure.
Entropy 27 00948 g002
Figure 3. Global average distance per measure.
Figure 3. Global average distance per measure.
Entropy 27 00948 g003
Figure 4. Global average TP score before and after expanding detected nodes by one hop per method.
Figure 4. Global average TP score before and after expanding detected nodes by one hop per method.
Entropy 27 00948 g004
Figure 5. Global average TP score before and after expanding detected nodes by two hops per method.
Figure 5. Global average TP score before and after expanding detected nodes by two hops per method.
Entropy 27 00948 g005
Figure 6. Global average F1 score before and after expanding detected nodes by one hop per method.
Figure 6. Global average F1 score before and after expanding detected nodes by one hop per method.
Entropy 27 00948 g006
Figure 7. Global average F1 score before and after expanding detected nodes by two hops per method.
Figure 7. Global average F1 score before and after expanding detected nodes by two hops per method.
Entropy 27 00948 g007
Figure 8. Global average TPR score before and after expanding detected nodes by one hop per method.
Figure 8. Global average TPR score before and after expanding detected nodes by one hop per method.
Entropy 27 00948 g008
Figure 9. Global average TPR score before and after expanding detected nodes by two hops per method.
Figure 9. Global average TPR score before and after expanding detected nodes by two hops per method.
Entropy 27 00948 g009
Figure 10. Global average PPV score before and after expanding detected nodes by one hop per method.
Figure 10. Global average PPV score before and after expanding detected nodes by one hop per method.
Entropy 27 00948 g010
Figure 11. Global PPV score before and after expanding detected nodes by two hops per method.
Figure 11. Global PPV score before and after expanding detected nodes by two hops per method.
Entropy 27 00948 g011
Figure 12. Global average F1 score before and after expanding detected nodes by one hop for the Dolphin network.
Figure 12. Global average F1 score before and after expanding detected nodes by one hop for the Dolphin network.
Entropy 27 00948 g012
Figure 13. Global average F1 score before and after expanding detected nodes by one hop for the Football network.
Figure 13. Global average F1 score before and after expanding detected nodes by one hop for the Football network.
Entropy 27 00948 g013
Figure 14. Global average F1 score before and after expanding detected nodes by one hop for the SW-1 network.
Figure 14. Global average F1 score before and after expanding detected nodes by one hop for the SW-1 network.
Entropy 27 00948 g014
Figure 15. Global average F1 score before and after expanding detected nodes by one hop for the SW-2 network.
Figure 15. Global average F1 score before and after expanding detected nodes by one hop for the SW-2 network.
Entropy 27 00948 g015
Figure 16. Global average F1 score before and after expanding detected nodes by one hop for the SF-1 network.
Figure 16. Global average F1 score before and after expanding detected nodes by one hop for the SF-1 network.
Entropy 27 00948 g016
Figure 17. Global average F1 score before and after expanding detected nodes by one hop for the SF-2 network.
Figure 17. Global average F1 score before and after expanding detected nodes by one hop for the SF-2 network.
Entropy 27 00948 g017
Figure 18. Global average F1 score before and after expanding detected nodes by one hop for the Facebook network.
Figure 18. Global average F1 score before and after expanding detected nodes by one hop for the Facebook network.
Entropy 27 00948 g018
Figure 19. Global average F1 score before and after expanding detected nodes by one hop for the Social network.
Figure 19. Global average F1 score before and after expanding detected nodes by one hop for the Social network.
Entropy 27 00948 g019
Table 1. The networks and their analysis used in the study.
Table 1. The networks and their analysis used in the study.
NetworkNodesEdgesDensityAssortativityAvg. Clustering CoefficientDegree (min/avg./max)
Dolphin621590.0841−0.04360.2591/5.13/12
Football1156130.09350.16240.40327/10/12
SF-150024750.0198−0.09660.06595/9.9/69
SW-150025000.0200−0.02440.16405/10.0/16
SF-2 100049750.0100−0.06130.04235/9.95/126
SW-2100050000.0100−0.00610.14785/10.0/16
Facebook 403988,2340.01080.06360.60551/44/1045
Social12,600671,0000.0008−0.12190.22751/10/8700
Table 2. Confusion matrix explanation.
Table 2. Confusion matrix explanation.
Predicted PositivePredicted Negative
Actually positiveTP (true positive)FN (false negative)
Actually negativeFP (false positive)TN (true negative)
Table 3. List of the centrality measures used in this research.
Table 3. List of the centrality measures used in this research.
Centrality Measure
Algebraic [AL]Decay [DC]Load [LD]
Average distance [AV]Degree [DE]MNC [MNC]
Barycenter [BR]Diffusion degree [DD]PageRank [PR]
Betweenness [BC]Eigenvector [EI]Percolation [PE]
Closeness [CL]Geodestic k path [GK]Radiality [RA]
Cluster rank [CR]Harmonic [HA]Subgraph [SU]
Coreness [CO]Heatmap [HE]Topological [TO]
Current Ffow betweenness (random walk betweenness) [CFB]Leverage [LE]
Current flow closeness (information centrality) [CFC]Lin [LIN]
Table 4. Performance metrics of propagation source detection methods for all experiments: comparison of average TP, F1, TPR, and PPV. (a) Directly identified sources. (b) Sources extended by one hop.
Table 4. Performance metrics of propagation source detection methods for all experiments: comparison of average TP, F1, TPR, and PPV. (a) Directly identified sources. (b) Sources extended by one hop.
(a)
TPF1TPRPPV
AV12.0830.0730.0730.073
TO13.1460.0710.0710.071
HE13.3330.0690.0690.069
CO10.5100.0630.0630.063
LD6.9270.0540.0540.054
PE6.9060.0510.0510.051
BC6,9060.0510.0510.051
JC6.7600.0510.0510.051
CFB6.4060.0510.0510.051
SU6.3130.0500.0500.050
BR7.0420.0500.0500.050
DD6.5210.0490.0490.049
MNC6.5000.0480.0480.048
LE6.6670.0470.0470.047
DE6.4690.0460.0460.046
AL6.3330.0460.0460.046
EI6.3330.0460.0460.046
PR6.1670.0460.0460.046
GK6.5520.0450.0450.045
CFC6.4270.0450.0450.045
HA6.5000.0450.0450.045
CL6.3650.0430.0430.043
LIN6.3650.0430.0430.043
RA6.3650.0430.0430.043
DC6.4060.0410.0410.041
CR7.2190.0410.0410.041
NT6.0210.0400.0400.040
(b)
TPF1TPRPPV
LD103.9690.1010.6770.056
GK90.2500.1010.5940.056
LE101.6350.1000.6810.055
PE103.9580.1000.6750.055
BC103.9580.1000.6750.055
LIN88.9480.0990.6160.055
RA88.9480.0990.6160.055
CL88.9480.0990.6160.055
CFB100.7600.0990.6600.054
HA92.6040.0980.6340.054
JC91.5210.0980.5680.055
PR99.9060.0980.6640.053
CFC90.9480.0980.5990.054
DC92.0630.0970.6190.054
DE97.4270.0950.6370.052
HE22.1460.0950.2480.063
MNC95.5730.0940.6230.052
NT68.3960.0920.4980.052
TO20.5000.0920.2220.063
BR78.1460.0920.4640.052
DD86.3750.0920.5600.051
SU76.0100.0900.4740.052
AL76.6560.0900.4900.051
EI76.6560.0900.4900.051
AV21.0310.0900.2320.060
CO20.5310.0860.3350.055
CR24.4580.0820.3050.049
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Frąszczak, D. Identifying Network Propagation Sources Using Advanced Centrality Measures. Entropy 2025, 27, 948. https://doi.org/10.3390/e27090948

AMA Style

Frąszczak D. Identifying Network Propagation Sources Using Advanced Centrality Measures. Entropy. 2025; 27(9):948. https://doi.org/10.3390/e27090948

Chicago/Turabian Style

Frąszczak, Damian. 2025. "Identifying Network Propagation Sources Using Advanced Centrality Measures" Entropy 27, no. 9: 948. https://doi.org/10.3390/e27090948

APA Style

Frąszczak, D. (2025). Identifying Network Propagation Sources Using Advanced Centrality Measures. Entropy, 27(9), 948. https://doi.org/10.3390/e27090948

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop