Next Article in Journal
Effects of Nitrogen, Oxygen, and Hydrogen Nanobubbles on Human Lung Fibroblast Viability
Previous Article in Journal
DFT-Based Study of CO2 Adsorption Mechanism on Carbon Materials
Previous Article in Special Issue
Open-Vocabulary Multi-Object Tracking Based on Multi-Cue Fusion
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Communication

Scientific Impact Prediction via Virtual Geography Hawkes Process

1
Graduate School of Advanced Interdisciplinary Science, Tokyo University of Agriculture and Technology, Koganei 184-8588, Tokyo, Japan
2
Intelligent Platforms Research Institute, National Institute of Advanced Industrial Science and Technology (AIST), Koto 135-0064, Tokyo, Japan
*
Author to whom correspondence should be addressed.
Appl. Sci. 2026, 16(4), 2085; https://doi.org/10.3390/app16042085
Submission received: 20 January 2026 / Revised: 11 February 2026 / Accepted: 12 February 2026 / Published: 20 February 2026
(This article belongs to the Special Issue AI for Sustainability and Innovation—2nd Edition)

Abstract

This brief proposes a novel Virtual-Geography Hawkes Process (VG-Hawkes) to model citation dynamics considering academic networks. The VG-Hawkes model incorporates academic relationships between authors as virtual distances by extending the conventional temporal Hawkes process, enabling a more detailed and realistic representation of citation behavior. Validation results on real-world datasets show that the VG-Hawkes model consistently achieves higher log-likelihood scores than temporal Hawkes models and effectively captures citation peaks and distributional patterns. While this study focuses on selected datasets and pairwise interactions, the model is general and readily extensible. Future work includes scaling to broader datasets and incorporating more complex author relationships. The VG-Hawkes model provides a novel and flexible framework for academic network analysis and scientific impact prediction.

1. Introduction

1.1. Background

There is growing demand for methods that can identify high-quality recent research within the rapidly growing body of literature [1,2,3] Citation count is widely used as a fundamental metric for evaluating a researcher’s scientific impact. Popular indicators include the impact factor [4], h-index [5], and g-index [6]. The impact factor reflects the average number of citations received per year by articles published in a journal [7], while the h-index is defined as the largest number h, such that a researcher has h papers each cited at least h times. The g-index refines this by identifying the largest g such that the top g articles collectively have at least g 2 citations. These citation-based indices are commonly adopted in academic hiring, promotion, and funding decisions, as they provide quantitative estimates of scientific importance and influence. However, such metrics primarily capture past performance and cannot predict the future impact of a publication. For instance, even papers published in the same journal can diverge widely in their long-term citation counts—from a handful to over a thousand. This variability illustrates that high current metrics do not guarantee sustained influence, while researchers with modest early metrics may later produce groundbreaking work. Consequently, metrics like the impact factor and h-index, which reflect only current evaluation, may not offer fair or forward-looking criteria for assessment.

1.2. Research Aim and Objectives

This study aims to develop a structure-aware point-process framework for modeling and analyzing citation dynamics by extending spatiotemporal Hawkes processes with virtual academic proximity that reflects author relationships (VG-Hawkes).
To achieve this aim, we pursue the following objectives: (i) formulate a Hawkes-process intensity in which excitation is modulated by virtual distances derived from academic relationships rather than physical geography; (ii) provide a modeling mechanism that captures both self-excitation and mutual excitation among authors/publications under the same probabilistic framework; and (iii) demonstrate feasibility on real-world citation records by showing improved likelihood-based fit and interpretable intensity patterns compared with temporal Hawkes baselines.

1.3. Contributions

Existing approaches to academic citation modeling often enhance point-process or temporal models by introducing author/paper representations as covariates (e.g., embedding-based features), by imposing network kernels that encode observed graph proximity, or by directly parameterizing relational intensities using pre-defined links or similarity measures. While effective in their respective settings, these designs typically treat the network structure as either an external feature source or a fixed coupling mechanism.
In contrast, our notion of virtual geography provides a unified and explicitly interpretable mechanism to organize heterogeneous academic interactions into a latent “distance” space that directly modulates excitation in a Hawkes-process intensity. Rather than only injecting embeddings as static covariates or prescribing a specific network kernel, virtual geography induces a structured interaction map that governs how events propagate across entities and interaction types, enabling a consistent treatment of citations and other interaction channels under a single temporal point-process framework.
The main contributions are summarized as follows.
C1 
Virtual-geography Hawkes modeling. We propose a Hawkes-process formulation in which excitation is modulated by a virtual-geography structure, offering an interpretable latent organization of interaction strength beyond fixed graph kernels or purely feature-driven covariates.
C2 
Unified modeling of heterogeneous interactions. We provide a unified event-driven framework that can incorporate multiple academic interaction channels (e.g., citations and collaboration-related signals) through a shared intensity construction, instead of handling each relation with separate ad hoc models.
C3 
Learning procedure and feasibility demonstration. We present a practical learning pipeline for the proposed model and provide an initial empirical validation demonstrating feasibility and the added value of the virtual-geography mechanism within the Communication scope.

2. Related Work

2.1. Citation-Impact and Scientometric Models

Forecasting long-term citation impact is valuable for research assessment, funding decisions, and discovery systems that rely on citation-driven relevance signals. Beyond descriptive indices (e.g., impact factor, h-index), several predictive models have been proposed to explain and forecast citation accumulation. For example, Wang et al. [8] introduced a generative model combining a publication’s fitness and longevity to support long-horizon citation forecasting, and Sinatra et al. [9] proposed the Q-model to quantify an author’s intrinsic scientific ability and its effect on impact trajectories. While influential, these approaches typically do not explicitly model relational excitation mechanisms induced by academic interactions such as co-authorship ties and citation-network structure, which are central to network-driven propagation of influence.

2.2. Point-Process and Hawkes-Process Modeling of Citation Dynamics

Temporal point processes provide a principled probabilistic framework for modeling event-time data and have been widely used to capture self-excitation, where past events increase the likelihood of future events. In particular, Hawkes processes [10] offer an interpretable mechanism for event clustering through a baseline intensity and an excitation kernel. This paradigm has been adopted in multiple domains and is naturally suited for modeling citation events over time because citations often exhibit bursts and long-tail decay. Likelihood-based learning procedures, including EM-type methods, further make Hawkes models attractive for data-driven inference.

2.3. Network-Aware Extensions and Representation-Based Approaches

A key challenge in citation modeling is to incorporate academic relationships, such as co-authorship, topical similarity, and influence between communities, in a way that goes beyond purely temporal recurrence [11,12]. One direction augments temporal models with network-derived covariates or representation learning (e.g., author/paper embeddings) to capture latent similarity, while another direction introduces relational coupling via graph-based kernels or interaction functions that depend on distances or proximities in an underlying space [13,14,15]. Relatedly, spatiotemporal Hawkes processes incorporate spatial dependence through physical distance [16,17], which can be effective when geography is meaningful; however, in academic networks, physical proximity is often less relevant than latent scholarly proximity.

2.4. Positioning of VG-Hawkes

Motivated by the above, we propose the Virtual-Geography Hawkes Process (VG-Hawkes), which adapts the spatiotemporal Hawkes principle [17,18] to academic networks by replacing physical distance with virtual academic distance. This design provides an interpretable mechanism to modulate excitation by latent scholarly proximity, enabling a unified event-driven model that captures both temporal clustering and network-structured mutual excitation. Compared with approaches that treat network structure only as external features or fixed graph proximity, VG-Hawkes introduces a distance-based interaction map tailored to academic relationships, while retaining likelihood-based inference and extensibility within the Hawkes framework.

3. Method

3.1. Concept

We conceptualize academic network formation as a hierarchical clustering process, in which researchers gather around thematic topics, form collaborative clusters, and establish new co-authorship ties through scholarly interaction. We hypothesize that the structural evolution of these networks is driven by both self-excitation (e.g., influence from a researcher’s own prior work) and mutual excitation (e.g., influence from others’ work or external publications). An overview of this conceptual framework is illustrated in the left side of Figure 1. Leveraging the Virtual-Geography Hawkes Process (VG-Hawkes), our model naturally captures the group structure and scale-free characteristics observed in academic networks. Author relationships are quantitatively encoded as distances in a virtual space, enabling explicit and interpretable modeling of both citation and co-authorship dynamics. This approach offers a novel perspective for analyzing the evolving structure of academic networks. While modeling cross-disciplinary influences and interactions between otherwise unrelated authors remains a subject for future exploration, the proposed method provides a solid foundation for structurally informed citation impact prediction.

3.2. Overview of the Proposed Method

This study introduces the virtual network Hawkes process, an extension of the conventional spatiotemporal Hawkes process, designed to model academic relationships between authors as virtual spatial positions. Rather than relying on physical distances, the proposed model defines virtual distances based on citation and co-authorship relationships, enabling it to capture the latent structural characteristics of academic networks. By doing so, the model effectively represents the temporal dynamics of research paper citations with the academic proximity of authors. This concept is illustrated on the right side of Figure 1. The horizontal axis denotes time, and each marker (square) corresponds to an event, interpreted as a paper publication occurring at a particular time point. Events are color-coded by author: green for Author 1, red for Author 2, and blue for publications associated with both Author 1 and Author 2. This visualization reflects how events propagate and cluster over time, conditioned on virtual distances in the latent academic space.

3.3. Hawkes Process

The Hawkes process [10] is a probabilistic point process that accounts for the possibility of one event “triggering” another. This process exhibits self-exciting behavior, where the occurrence of an event temporarily increases the likelihood of subsequent events, resulting in a clustering pattern. Due to this property, the Hawkes process has been widely applied in various domains such as modeling aftershock sequences in seismology and the spread of infectious diseases. The Hawkes process consists mainly of two components: a background rate and an excitation term. Its intensity function is defined as follows:
λ ( t ) = μ + t i < t g ( t t i ) .
Here, μ represents the baseline rate of spontaneous events (background events), and g ( t t i ) denotes the triggering kernel that quantifies the likelihood of a direct offspring event generated by a previous event occurring at time t i . Both μ and g are non-negative functions. Through this excitation mechanism, the Hawkes process enables dynamic prediction of future events based on the history of past events.
While the standard Hawkes process focuses on the temporal characteristics of events, it cannot accommodate additional attributes, such as the magnitude of an earthquake or the severity of an infectious disease case. The marked Hawkes process extends the original model by incorporating additional information (called “marks”) associated with each event. The intensity function is extended as follows:
λ ( t , m ) = μ ( t , m ) + k : t k < t g ( t t k , m k )
In many systems, the intensity function depends not only on time but also on spatial location. When the standard Hawkes process is extended to incorporate both temporal and spatial dimensions, the conditional intensity function is expressed as follows:
λ ( t , x ) = μ ( t , x ) + k : t k < t g ( t t k , x x k )
In this spatio-temporal Hawkes model, the excitation function g ( t t k , x x k ) represents spatial dependency based on the physical distance between the location x k of a past event and the current location x .

3.4. Virtual Network Hawkes Process

In academic network structures such as citation and co-authorship relationships, physical distance is not necessarily a dominant factor influencing the likelihood of event occurrence. For instance, in citation or collaboration networks, the influence between authors is more strongly determined by academic relevance or the thematic similarity of research topics rather than by geographic proximity. To address this, we formulate the model using a Virtual Network Hawkes Process, in which the notion of “physical distance” is replaced with a “virtual distance.” The virtual distance defined in this study quantifies the academic closeness between researchers or papers. It is computed based on factors such as citation relationships and co-authorship connections. A smaller virtual distance indicates a stronger academic connection, while a larger distance represents weaker relevance. This virtual distance is then incorporated into the model as a substitute for physical spatial distance. We formulate a model that captures the citation dynamics of academic papers by integrating the Hawkes process framework with virtual academic distance. Let the virtual position of author i in the Virtual-Geography Hawkes Process (VG-Hawkes) be represented by x a ( i ) R n , i = 1 , , n + 1 , and the position of publication k be denoted by x p ( k ) R n , k = 1 , , N . These positions are learned based on prior knowledge and a collected dataset. The publication year of each paper is denoted by t k , k = 1 , , N , as described in Section 3.
Remark 1 
(On the construction of virtual distances). In this Communication, the virtual geography is implemented by treating the author and publication positions { x a ( i ) } and { x p ( k ) } as learnable latent variables that are estimated from the observed event history under the VG-Hawkes likelihood. Accordingly, the “virtual distance” used in the triggering kernel is the Euclidean distance in this latent space, i.e., x p x p ( k ) , which provides an interpretable notion of academic proximity induced by citation/co-authorship interactions through maximum-likelihood fitting. An explicit deterministic mapping from raw citation/co-authorship relations to virtual coordinates (e.g., via graph embedding or similarity-based construction) can also be integrated into the framework, but is beyond the scope of the present brief and will be investigated in future work.
Using these positions and publication times, we define a kernel-based intensity model as follows:
λ vgh ( t , x p ) : = μ 0 μ ( t , x p ) + A k : t k < t g ( t t k , x p x p ( k ) )
Here, μ ( t , x p ) is the background rate, and g ( t t k , x p x p ( k ) ) represents the excitation effect. The constants μ 0 and A are relaxation coefficients. To incorporate citation count m into the model, we define:
s ( m ) = β exp ( β m ) , κ ( m ) = A exp ( α m ) ,
and extend the model to a marked Hawkes process as follows:
λ vgh : = s ( m ) μ 0 μ ( t , x p ) + A k : t k < t κ ( m k ) g ( t t k , x p x p ( k ) , m k )
Here, the probability that event j is a background event (background probability), and the probability that event j was triggered by event i (triggering probability), are given as follows [19]:
φ j = μ ( t j , x j ) λ ( t j , x j ) , ρ i j = κ ( m k ) g ( t j t i , x j x i ) λ ( t j , x j )
Moreover, the sum of the background and triggering probabilities is always equal to 1:
φ j + i = 1 j 1 ρ i j = 1 for all j
When an event occurs at a given point ( t , x ) , it can be interpreted that φ j background events are observed at ( t , x ) , and for each i = 1 , , j 1 , event i is responsible for generating ρ i j direct offspring events at ( t j , x j ) . Thus, event j can be decomposed into a background event component and descendant events triggered by past events. This formulation enables the use of non-parametric methods to estimate the background rate function μ ( t , x ) and the triggering kernel g ( t , x ) . Based on these definitions, the model captures the influence of past publication events over time and space on the current publication intensity.

3.5. Learning Algorithm

We describe the parameter estimation procedure for the model components: μ ( t , x p ) , g ( t , x p ) , μ 0 , and A, based on observed data. The estimation is performed using an Expectation-Maximization (EM) algorithm, inspired by the work of Zhuang et al. [19], Zhuang [20]. The procedure begins by initializing all parameters and then iteratively performs two steps:
  • In the E-step, the algorithm estimates the probability that each observed event was generated by the background component or was triggered by a previous event [16].
  • In the M-step, the model parameters are updated to maximize the expected complete-data log-likelihood [21].
This iterative process continues until convergence and allows for principled, data-driven learning of both the background and triggering dynamics underlying citation behavior in the proposed VG-Hawkes model. The proposed algorithm for estimating the virtual network Hawkes process is summarized in Algorithm 1.
Remark 2 
(On parameter settings). The proposed VG-Hawkes model uses kernel bandwidths (e.g., h t , h m ) and mark-related parameters (e.g., α, β), together with relaxation coefficients (e.g., μ 0 , A). In our experiments, these parameters are initialized following standard practice in kernel-based Hawkes estimation and then selected using a simple data-driven tuning protocol on the available event history (details and the exact values used in the experiments are provided in the revised manuscript for reproducibility). A comprehensive sensitivity analysis over these hyperparameters is an important topic, but is outside the scope of this Communication and will be pursued in future work.
Algorithm 1 EM-based Parameter Estimation for the Virtual Network Hawkes Process
  1:
Initialize model components: background intensity μ ( t , x p ) , triggering kernel g ( · ) , mark scaling κ ( m ) , and relaxation parameters μ 0 , A.
  2:
Set tolerance ε > 0 and maximum iterations I max .
  3:
Set iteration counter r 0 and compute initial log-likelihood ( 0 ) .
  4:
while  r < I max  do
  5:
      E-step: estimate origin of each event.
  •           For each event j, compute the intensity λ j : = λ vgh ( t j , x p ( j ) ; θ ) .
  •           Background responsibility: φ j : = μ 0 μ ( t j , x p ( j ) ) λ j .
  •           Triggering responsibility (for each i < j ): ρ i j : = A κ ( m i ) g ( t j t i , x p ( j ) x p ( i ) , m i ) λ j .
  6:
      M-step: update model parameters.
  •           Update μ ( · ) and g ( · ) using { φ j } and { ρ i j } (kernel-based updates following [19,20]).
  •           Update relaxation parameters: A N j = 1 N φ j j = 1 N G j / λ j , μ 0 N A · j = 1 N G j / λ j j = 1 N μ j / λ j , where μ j : = μ ( t j , x p ( j ) ) and G j : = i < j κ ( m i ) g ( t j t i , x p ( j ) x p ( i ) , m i ) .
  7:
      Convergence check: compute the log-likelihood ( r + 1 ) : = log L ( D N , θ ) (see Equation (13)).
  8:
      if  | ( r + 1 ) ( r ) | < ε then
  9:
           break
10:
      r r + 1

4. Results

4.1. Dataset

In this study, we evaluate the proposed model using a dataset collected from Google Scholar for each author, without restricting the source to any specific academic journal. During data collection, we focused on the research field of reinforcement learning. This decision was made because these fields are the authors’ primary areas of expertise, facilitating an easier understanding of the research content and algorithms presented in the papers. We first identify a seed researcher in a chosen domain and then construct a co-authorship network by recursively tracing co-authorship relationships from that seed researcher. In this process, the nodes represent individual authors, and the edges represent co-authorship links. For each publication, we collect the following attributes as features: publication year, publisher, co-authors, and the number of citations. The dataset uses Mohammad Ghavamzadeh and Alessandro Lazaric as the cluster and derived authors. Table 1, Table 2, Table 3 and Table 4 show the collected dataset.
We select two author-centric clusters in reinforcement learning as a controlled proof-of-concept setting: these clusters provide sufficiently long publication histories and dense citation/co-authorship interactions to highlight the benefit of incorporating virtual academic proximity into the Hawkes intensity while keeping the evaluation transparent. We acknowledge that such a selection may emphasize centralized or mentorship-driven structures; we therefore discuss this potential bias and the expected behavior in less centralized or more interdisciplinary settings in Section 4.

4.2. Validations

The red lines in Figure 2b,c show the convergence of the log-likelihood for Mohammad Ghavamzadeh and Alessandro Lazaric with two methods. The horizontal axis represents the number of iterations in the learning algorithm, and the vertical axis shows the log-likelihood log L ( D N , θ ) . On the other hand, the red lines in Figure 3a and Figure 4a depict the correspondence between the intensity function. The horizontal axis indicates normalized time (i.e., normalized paper publication date from 0 to 1), and the upper vertical axis shows the intensity function.
Next, we present the results obtained using the proposed Virtual Network Hawkes Process model. This model extends the temporal Hawkes process by incorporating the academic relationships and virtual distances between authors. The blue lines in Figure 2 show the convergence behavior of the log-likelihood. Figure 2a displays the overall dataset (including both Mohammad Ghavamzadeh and Alessandro Lazaric), while Figure 2b,c show individual log-likelihoods: blue for Marked VG Hawkes and red for Marked Temporal Hawkes. The horizontal axis represents the number of learning iterations, and the vertical axis shows the log-likelihood log L ( D N , θ ) , indicating the convergence of the model during training. The blue lines in Figure 3 and Figure 4 illustrate the correspondence between the intensity function and citation counts for Mohammad Ghavamzadeh and Alessandro Lazaric, respectively. In each figure, the horizontal axis represents normalized time (i.e., normalized publication timing), the upper vertical axis indicates the intensity function, and the lower vertical axis shows the normalized number of citations for each paper.
The results confirm the effectiveness of the proposed Virtual Network Hawkes Process model in capturing citation dynamics. As shown in Figure 2, the VG-Hawkes model consistently achieves higher log-likelihood values compared to the conventional temporal Hawkes model across all datasets, indicating improved fit and convergence during training. This suggests that incorporating virtual academic distances and author relationships enables the model to better represent the underlying generative structure of citation events. In Figure 3 and Figure 4, the intensity functions generated by the VG-Hawkes model more closely follow the actual citation trends over normalized time, particularly in capturing periods of heightened academic activity. This improvement is attributed to the model’s ability to account for mutual excitation effects between authors based on their latent academic proximity, rather than relying solely on temporal recurrence. By embedding structural information from citation and co-authorship networks, the VG-Hawkes framework provides a more accurate and interpretable representation of how scholarly influence propagates over time.
We use log-likelihood as the primary quantitative evaluation criterion because VG-Hawkes is a probabilistic generative point-process model learned via maximum likelihood. Therefore, log-likelihood directly measures goodness-of-fit of the observed event history under the modeled intensity and provides a standard, objective basis for comparing temporal Hawkes and VG-Hawkes under the same inference procedure. Complementary predictive evaluation is also possible by held-out forecasting (e.g., next-event time prediction or citation count prediction over a future window) and reporting prediction errors; we leave such broader predictive benchmarking to future work.

5. Discussion and Limitations

This study presents the Virtual Network Hawkes Process as a novel approach to modeling citation dynamics by integrating latent academic relationships into a spatiotemporal point process framework. The empirical results demonstrate promising performance in both likelihood convergence and citation-intensity correspondence, indicating the model’s potential to capture the structural and temporal complexities of academic influence.
While the proposed method shows clear advantages over conventional temporal Hawkes processes, several limitations remain. First, this work does not yet cover the full generality of the proposed framework, such as explicitly modeling cross-disciplinary interactions or dynamically evolving author embeddings. Second, the current implementation relies on predefined datasets and background functions, which may be extended in future work through more adaptive or learning-based formulations.
Nonetheless, this paper represents an important first step toward structure-aware citation modeling. The approach is conceptually general, interpretable, and computationally tractable offering a flexible basis for further development. As a brief report, this work aims not to exhaustively address every technical dimension, but to provide a clear direction for future research. The framework and results presented here are well-positioned to spark deeper exploration into the dynamics of academic networks and influence propagation.
The current evaluation is based on two reinforcement-learning author-centric clusters, which may over-represent hierarchical or mentorship-driven structures and thus bias citation dynamics toward stronger clustering. For less centralized authors, interdisciplinary researchers, or fields with weaker mentorship patterns, we expect the excitation to be more diffuse and the learned virtual distances to reflect weaker, broader interaction channels. Systematic validation across more authors and research domains is an important direction for future work.
The empirical evaluation in this Communication is intended as an initial feasibility study of the proposed VG-Hawkes framework, using two representative author-centric clusters within a single field to provide a controlled setting and to highlight the effect of incorporating virtual academic proximity. Accordingly, we focus on likelihood-based goodness-of-fit and intensity–citation correspondence as primary validation signals. We acknowledge that a more comprehensive empirical study would further strengthen the evidence, including (i) systematic sensitivity analysis with respect to kernel and mark-related parameters, (ii) predictive evaluation on held-out citation counts using additional metrics (e.g., count prediction error), and (iii) broader coverage across more authors and research domains. These extensions are important and will be pursued in future work.
We do not report confidence intervals for the estimated intensity functions in this Communication. Rigorous uncertainty quantification for point-process intensities typically requires additional inference procedures (e.g., bootstrap-based variability assessment or Bayesian posterior inference), and we leave this extension as future work.

6. Conclusions and Future Work

This Communication proposed a Virtual-Geography Hawkes process (VG-Hawkes) to model citation dynamics by augmenting a temporal Hawkes process with virtual academic proximity extracted from citation and co-authorship relations. By embedding structural information into the excitation mechanism, the resulting intensity captures both temporal recurrence and network-driven mutual excitation. Empirical results on two author-centric clusters suggest improved likelihood convergence and a closer correspondence between learned intensities and observed citation patterns compared with a purely temporal Hawkes baseline.
Several extensions are natural and will be pursued in future work. First, broader validation across additional authors, disciplines, and less centralized collaboration structures is necessary to assess generalization. Second, complementary predictive evaluations (e.g., held-out forecasting of citation counts or next-event prediction) and systematic sensitivity analyses for kernel/mark-related hyperparameters will strengthen the empirical evidence. Third, uncertainty quantification for intensity estimates (e.g., via bootstrap-based variability assessment or Bayesian inference) is an important direction beyond the scope of this brief report.

Author Contributions

Conceptualization, X.S. and X.L.; methodology, X.S., X.L. and B.G.; software, B.G. and X.S.; validation, B.G. and X.S.; formal analysis, B.G. and X.S.; investigation, X.S. and B.G.; resources, X.L., A.M. and K.K.; data curation, X.S.; writing—original draft preparation, B.G. and X.S.; writing—review and editing, B.G., X.L., A.M., K.K. and X.S.; visualization, X.S.; supervision, X.S.; project administration, X.S., X.L., A.M. and K.K.; funding acquisition, X.L., A.M. and K.K. All authors have read and agreed to the published version of the manuscript.

Funding

This paper is based on results obtained from the project, JPNP25006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO).

Data Availability Statement

All the information is available in google scholar and it is summarized in the Tables.

Conflicts of Interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

  1. Seki, Y.; Matsuo, Y. Citation Count Prediction Using Citation Information. In Proceedings of the 25th Annual Conference of the Japanese Society for Artificial Intelligence, Morioka, Japan, 1–3 June 2011. [Google Scholar]
  2. Hirako, J.; Sasano, R.; Takeda, K. CiMaTe: Citation Count Prediction Effectively Leveraging the Main Text. arXiv 2024, arXiv:2410.04404. Available online: https://arxiv.org/html/2410.04404v1 (accessed on 20 January 2026).
  3. Mori, J.; Hara, T.; Sakaki, T.; Kajikawa, Y.; Sakata, I. Identifying Central Researchers in Emerging Research Areas via Co-authorship Network Analysis of Large-Scale Academic Data. In Proceedings of the 29th Annual Conference of the Japanese Society for Artificial Intelligence, Hakodate, Japan, 30 May–2 June 2015. [Google Scholar]
  4. Garfield, E. Citation Analysis as a Tool in Journal Evaluation. Science 1972, 178, 471–479. [Google Scholar] [CrossRef] [PubMed]
  5. Hirsch, J.E. An Index to Quantify an Individual’s Scientific Research Output. Proc. Natl. Acad. Sci. USA 2005, 102, 16569–16572. [Google Scholar] [CrossRef] [PubMed]
  6. Egghe, L. An Improvement of the h-Index: The g-Index. Scientometrics 2006, 69, 131–152. [Google Scholar] [CrossRef]
  7. Wang, L.; Du, W.; Chen, Z. Multi-Feature-Enhanced Academic Paper Recommendation Model with Knowledge Graph. Appl. Sci. 2024, 14, 5022. [Google Scholar] [CrossRef]
  8. Wang, D.; Song, C.; Barabási, A.L. Quantifying Long-Term Scientific Impact. Science 2013, 342, 127–132. [Google Scholar] [CrossRef] [PubMed]
  9. Sinatra, R.; Wang, D.; Deville, P.; Song, C.; Barabási, A.L. Quantifying the Evolution of Individual Scientific Impact. Science 2016, 354, aaf5239. [Google Scholar] [CrossRef] [PubMed]
  10. Hawkes, A.G. Point Spectra of Some Mutually Exciting Point Processes. J. R. Stat. Soc. Ser. Stat. Methodol. 1971, 33, 438–443. [Google Scholar] [CrossRef]
  11. Yu, Q.; Long, C.; Lv, Y.; Shao, H.; He, P.; Duan, Z. Predicting Co-Author Relationship in Medical Co-Authorship Networks. PLoS ONE 2014, 9, e101214. [Google Scholar] [CrossRef] [PubMed]
  12. Zhao, Z.; Liu, W.; Qian, Y.; Nie, L.; Yin, Y.; Zhang, Y. Identifying advisor-advisee relationships from co-author networks via a novel deep model. Inf. Sci. 2018, 466, 258–269. [Google Scholar] [CrossRef]
  13. Asatani, K.; Mori, J.; Ochi, M.; Sakata, I. Detecting trends in academic research from a citation network using network representation learning. PLoS ONE 2018, 13, e0197260. [Google Scholar] [CrossRef] [PubMed]
  14. Wang, W.; Tang, T.; Xia, F.; Gong, Z.; Chen, Z.; Liu, H. Collaborative Filtering With Network Representation Learning for Citation Recommendation. IEEE Trans. Big Data 2022, 8, 1233–1246. [Google Scholar] [CrossRef]
  15. Wang, S.; Gai, K.; Yu, J.; Zhang, Z.; Zhu, L. PraVFed: Practical Heterogeneous Vertical Federated Learning via Representation Learning. IEEE Trans. Inf. Forensics Secur. 2025, 20, 2693–2705. [Google Scholar] [CrossRef]
  16. Zhuang, J.; Mateu, J. A Semiparametric Spatiotemporal Hawkes-Type Point Process Model with Periodic Background for Crime Data. J. R. Stat. Soc. Ser. A 2019, 182, 919–942. [Google Scholar] [CrossRef]
  17. Bernabeu, A.; Zhuang, J.; Mateu, J. Spatio-Temporal Hawkes Point Processes: A Review. J. Agric. Biol. Environ. Stat. 2025, 30, 89–119. [Google Scholar] [CrossRef]
  18. Laub, P.J.; Lee, Y.; Pollett, P.K.; Taimre, T. Hawkes Models and Their Applications. Annu. Rev. Stat. Its Appl. 2025, 12, 233–258. [Google Scholar] [CrossRef]
  19. Zhuang, J.; Ogata, Y.; Vere-Jones, D. Analyzing Earthquake Clustering Features by Using Stochastic Reconstruction. J. Geophys. Res. 2004, 109, B05301. [Google Scholar] [CrossRef]
  20. Zhuang, J. Second-Order Residual Analysis of Spatiotemporal Point Processes and Applications in Model Evaluation. J. R. Stat. Soc. Ser. Stat. Methodol. 2006, 68, 635–653. [Google Scholar] [CrossRef]
  21. Lange, K. MM Optimization Algorithms; SIAM: Philadelphia, PA, USA, 2016. [Google Scholar]
Figure 1. Intuitive explanation of the concept and overview of the proposed method. The (left) side: hierarchical structure of academic networks. The (right) side: overview of the proposed method.
Figure 1. Intuitive explanation of the concept and overview of the proposed method. The (left) side: hierarchical structure of academic networks. The (right) side: overview of the proposed method.
Applsci 16 02085 g001
Figure 2. Log-likelihood convergence.
Figure 2. Log-likelihood convergence.
Applsci 16 02085 g002
Figure 3. Mohammad Ghavamzadeh: (a) Intensity function; (b) Citation correspondence.
Figure 3. Mohammad Ghavamzadeh: (a) Intensity function; (b) Citation correspondence.
Applsci 16 02085 g003
Figure 4. Alessandro Lazaric: (a) Intensity function; (b) citation correspondence.
Figure 4. Alessandro Lazaric: (a) Intensity function; (b) citation correspondence.
Applsci 16 02085 g004
Table 1. Dataset for Mohammad Ghavamzadeh: Basic Information.
Table 1. Dataset for Mohammad Ghavamzadeh: Basic Information.
TitlePublisher (or Conference)TimeCo-Author
Continuous-time Hierarchical Reinforcement Learning
Hierarchically Optimal Average Reward Reinforcement LearningICML8 July 2002
Hierarchical Policy Gradient AlgorithmsICML2003
Bayesian Actor-Critic AlgorithmsICML2007
Hierarchical Average Reward Reinforcement LearningJMLR2007
Regularized Policy IterationNeurIPS2008Amir-massoud Farahmand (Offspring)
Natural Actor-Critic AlgorithmsAutomatica2009
Analysis of a Classification-based Policy Iteration AlgorithmICML2010Alessandro Lazaric (Offspring)
Bayesian Multi-task Reinforcement LearningICML2010Alessandro Lazaric (Offspring)
Finite-sample Analysis of LSTDICML21 July 2010Alessandro Lazaric (Offspring)
LSTD with Random ProjectionsNeurIPS2010Alessandro Lazaric (Offspring)
Finite-sample Analysis of Lasso-TDICML2011Alessandro Lazaric (Offspring)
Speedy Q-learningNeurIPS2011
Classification-based Policy Iteration with a CriticICML5 May 2011
Approximate Modified Policy IterationAAAI2012
Finite-sample Analysis of Least-squares Policy IterationJMLR1 October 2012Alessandro Lazaric (Offspring)
Actor-Critic Algorithms for Risk-sensitive MDPsNeurIPS2013Prashanth L.A. (Offspring)
Approximate Dynamic Programming Finally Performs Well in the Game of TetrisNeurIPS2013
Algorithms for CVaR Optimization in MDPsNeurIPS2014Yinlam Chow (Offspring)
High Confidence Policy ImprovementICML2015Philip Thomas (Offspring)
Approximate Modified Policy Iteration and its Application to the Game of TetrisJMLR2015
Maximum Entropy Semi-Supervised Inverse Reinforcement LearningIJCAI2015Alessandro Lazaric (Offspring)
Personalized Ad Recommendation Systems for Life-time Value Optimization with GuaranteesIJCAI2015Philip Thomas (Offspring)
Policy Gradient for Coherent Risk MeasuresNeurIPS2015Aviv Tamar (Co), Yinlam Chow (Offspring)
Proximal Gradient Temporal Difference Learning AlgorithmsIJCAI2016
Regularized Policy Iteration with Nonparametric Function SpacesJMLR1 January 2016
Analysis of Classification-based Policy Iteration AlgorithmsJMLR2016Alessandro Lazaric (Offspring)
Safe Policy Improvement by Minimizing Robust Baseline RegretNeurIPS2016Yinlam Chow (Offspring)
Bayesian Policy Gradient and Actor-Critic AlgorithmsJMLR2016
Improved Learning Complexity in Combinatorial Pure Exploration BanditsAISTATS2 May 2016Alessandro Lazaric (Offspring)
Sequential Decision-making with Coherent RiskIEEE TAC2017Aviv Tamar (Co), Yinlam Chow (Offspring)
Risk-constrained Reinforcement Learning with Percentile Risk CriteriaJMLR2018Yinlam Chow (Offspring)
More Robust Doubly Robust Off-policy EvaluationICML2018Yinlam Chow (Offspring)
A Lyapunov-based Approach to Safe Reinforcement LearningNeurIPS2018Yinlam Chow (Offspring)
Path Consistency Learning in Tsallis Entropy Regularized MDPsICML3 July 2018Yinlam Chow (Offspring)
Garbage In, Reward Out: Bootstrapping Exploration in Multi-armed BanditsICML2019
Tight Regret Bounds for Model-based Reinforcement Learning with Greedy PoliciesNeurIPS2019Yinlam Chow (Offspring)
Risk-sensitive Generative Adversarial Imitation LearningAISTATS11 April 2019Yinlam Chow (Offspring)
Optimizing Over a Restricted Policy Class in MDPsAISTATS11 April 2019
Predictive Coding for Locally-linear ControlICML21 November 2020Yinlam Chow (Offspring)
Table 2. Dataset for Mohammad Ghavamzadeh: Citation Information.
Table 2. Dataset for Mohammad Ghavamzadeh: Citation Information.
TitleCitationsCitations (1 Year)Citations (3 Years)Citations (5 Years)
Continuous-time Hierarchical Reinforcement Learning551414
Hierarchically Optimal Average Reward Reinforcement Learning21147
Hierarchical Policy Gradient Algorithms791717
Bayesian Actor-Critic Algorithms751823
Hierarchical Average Reward Reinforcement Learning4281316
Regularized Policy Iteration16311545
Natural Actor-Critic Algorithms1084543127
Analysis of a Classification-based Policy Iteration Algorithm8821941
Bayesian Multi-task Reinforcement Learning14311435
Finite-sample Analysis of LSTD8732540
LSTD with Random Projections7411428
Finite-sample Analysis of Lasso-TD5742333
Speedy Q-learning21541127
Classification-based Policy Iteration with a Critic3091723
Approximate Modified Policy Iteration6211630
Finite-sample Analysis of Least-squares Policy Iteration13331022
Actor-Critic Algorithms for Risk-sensitive MDPs3101217
Approximate Dynamic Programming Finally Performs Well in the Game of Tetris7811335
Algorithms for CVaR Optimization in MDPs37041752
High Confidence Policy Improvement22022876
Approximate Modified Policy Iteration and its Application to the Game of Tetris15421137
Maximum Entropy Semi-Supervised Inverse Reinforcement Learning3821022
Personalized Ad Recommendation Systems for Life-time Value Optimization with Guarantees20032261
Policy Gradient for Coherent Risk Measures1422823
Proximal Gradient Temporal Difference Learning Algorithms321816
Regularized Policy Iteration with Nonparametric Function Spaces12521241
Analysis of Classification-based Policy Iteration Algorithms5211023
Safe Policy Improvement by Minimizing Robust Baseline Regret16111355
Bayesian Policy Gradient and Actor-Critic Algorithms611826
Improved Learning Complexity in Combinatorial Pure Exploration Bandits4521430
Sequential Decision-making with Coherent Risk8062357
Risk-constrained Reinforcement Learning with Percentile Risk Criteria562343181
More Robust Doubly Robust Off-policy Evaluation275696180
A Lyapunov-based Approach to Safe Reinforcement Learning5795114355
Path Consistency Learning in Tsallis Entropy Regularized MDPs4622131
Garbage In, Reward Out: Bootstrapping Exploration in Multi-armed Bandits7863968
Tight Regret Bounds for Model-based Reinforcement Learning with Greedy Policies7854670
Risk-sensitive Generative Adversarial Imitation Learning3432029
Optimizing Over a Restricted Policy Class in MDPs123911
Predictive Coding for Locally-linear Control2421624
Table 3. Dataset: Alessandro Lazaric.
Table 3. Dataset: Alessandro Lazaric.
TitlePublisher (or Conference)DateCo-Author
Reinforcement learning in continuous action spaces through sequential monte carlo methodsNeurIPS2007
Transfer of samples in batch reinforcement learningICML5 July 2008
Finite-sample analysis of LSTDICML21 June 2010M. Ghavamzadeh (Cluster)
Analysis of a Classification-based Policy Iteration AlgorithmICML2010M. Ghavamzadeh (Cluster)
Bayesian Multi-task Reinforcement LearningICML2010M. Ghavamzadeh (Cluster)
LSTD with random projectionsNeurIPS2010M. Ghavamzadeh (Cluster)
Finite-sample analysis of Lasso-TDICML2011M. Ghavamzadeh (Cluster)
Transfer from multiple MDPsNeurIPS2011
Risk-aversion in multi-armed banditsNeurIPS2012
Finite-sample analysis of least-squares policy iterationJMLR1 October 2012M. Ghavamzadeh (Cluster)
Sequential transfer in multi-armed bandit with finite set of modelsNeurIPS2013
Exploiting easy data in online optimizationNeurIPS2014
Best-arm identification in linear banditsNeurIPS2014
Sparse multi-task reinforcement learningNeurIPS2014
Maximum entropy semi-supervised inverse reinforcement learningIJCAI25 July 2015M. Ghavamzadeh (Cluster)
Direct policy iteration with demonstrationsIJCAI25 July 2015
Analysis of Classification-based Policy Iteration AlgorithmsJMLR2016M. Ghavamzadeh (Cluster)
Improved Learning Complexity in Combinatorial Pure Exploration BanditsAISTATS2 May 2016M. Ghavamzadeh (Cluster)
Regret minimization in MDPs with options without prior knowledgeNeurIPS2017
Near optimal exploration-exploitation in non-communicating Markov decision processesNeurIPS2018
Fighting boredom in recommender systems with linear reinforcement learningNeurIPS2018
Efficient bias-span-constrained exploration-exploitation in reinforcement learningICML3 July 2018
Improved regret bounds for Thompson sampling in linear quadratic control problemsICML3 July 2018Marc Abeille (Offspring)
A structured prediction approach for generalization in cooperative multi-agent reinforcement learningNeurIPS2019
Exploration bonus for regret minimization in discrete and continuous average reward MDPsNeurIPS2019
Regret bounds for learning state representations in reinforcement learningNeurIPS2019
Limiting extrapolation in linear approximate value iterationNeurIPS2019E. Brunskill (Co-author)
Provably efficient reward-agnostic navigation with linear value iterationNeurIPS2020E. Brunskill (Co-author)
Improved sample complexity for incremental autonomous exploration in MDPsNeurIPS2020Jean Tarbouriech (Offspring)
Learning near optimal policies with low inherent Bellman errorICML21 November 2020E. Brunskill (Co-author)
Frequentist regret bounds for randomized least-squares value iterationAISTATS3 June 2020E. Brunskill (Co-author)
Active model estimation in Markov decision processesCUAI2020Jean Tarbouriech (Offspring)
No-regret exploration in goal-oriented reinforcement learningICML21 November 2020Jean Tarbouriech (Offspring)
Efficient optimistic exploration in linear-quadratic regulators via Lagrangian relaxationICML21 November 2020Marc Abeille (Offspring)
Reinforcement learning with prototypical representationsICML1 July 2021Denis Yarats (Offspring)
Mastering visual continuous control: Improved data-augmented reinforcement learningarXiv20 July 2021Denis Yarats (Offspring)
A provably efficient sample collection strategy for reinforcement learningNeurIPS6 December 2021Jean Tarbouriech (Offspring)
Stochastic shortest path: Minimax, parameter-free and towards horizon-free regretNeurIPS6 December 2021Jean Tarbouriech (Offspring)
Reinforcement learning in linear MDPs: Constant regret and representation selectionNeurIPS6 December 2021Matteo Papini (Offspring)
Adaptive multi-goal explorationAISTATS3 May 2022Jean Tarbouriech (Offspring)
Table 4. Dataset: Alessandro Lazaric: Citation Information.
Table 4. Dataset: Alessandro Lazaric: Citation Information.
TitleCitationsCo-AuthorCitations (1 Year)Citations (3 Years)Citations (5 Years)
Reinforcement learning in continuous action spaces through sequential monte carlo methods197 62645
Transfer of samples in batch reinforcement learning209 41328
Finite-sample analysis of LSTD87M. Ghavamzadeh (Cluster)32540
Analysis of a Classification-based Policy Iteration Algorithm88M. Ghavamzadeh (Cluster)21941
Bayesian Multi-task Reinforcement Learning143M. Ghavamzadeh (Cluster)11435
LSTD with random projections73M. Ghavamzadeh (Cluster)61730
Finite-sample analysis of Lasso-TD57M. Ghavamzadeh (Cluster)42333
Transfer from multiple MDPs62 41322
Risk-aversion in multi-armed bandits186 51727
Finite-sample analysis of least-squares policy iteration133M. Ghavamzadeh (Cluster)31022
Sequential transfer in multi-armed bandit with finite set of models121 11832
Exploiting easy data in online optimization62 11932
Best-arm identification in linear bandits212 22145
Sparse multi-task reinforcement learning83 11022
Maximum entropy semi-supervised inverse reinforcement learning38M. Ghavamzadeh (Cluster)21022
Direct policy iteration with demonstrations44 21226
Analysis of Classification-based Policy Iteration Algorithms52M. Ghavamzadeh (Cluster)11023
Improved Learning Complexity in Combinatorial Pure Exploration Bandits45M. Ghavamzadeh (Cluster)21430
Regret minimization in MDPs with options without prior knowledge30 2916
Near optimal exploration-exploitation in non-communicating Markov decision processes49 11227
Fighting boredom in recommender systems with linear reinforcement learning53 32450
Efficient bias-span-constrained exploration-exploitation in reinforcement learning115 11861
Improved regret bounds for Thompson sampling in linear quadratic control problems106Marc Abeille (Offspring)14587
A structured prediction approach for generalization in cooperative multi-agent reinforcement learning27 1721
Exploration bonus for regret minimization in discrete and continuous average reward MDPs44 12436
Regret bounds for learning state representations in reinforcement learning13 2913
Limiting extrapolation in linear approximate value iteration39E. Brunskill (Co-author)102338
Provably efficient reward-agnostic navigation with linear value iteration61E. Brunskill (Co-author)11647
Improved sample complexity for incremental autonomous exploration in MDPs17Jean Tarbouriech (Offspring)11017
Learning near optimal policies with low inherent Bellman error234E. Brunskill (Co-author)167198
Frequentist regret bounds for randomized least-squares value iteration146E. Brunskill (Co-author)249126
Active model estimation in Markov decision processes29Jean Tarbouriech (Offspring)1523
No-regret exploration in goal-oriented reinforcement learning43Jean Tarbouriech (Offspring)32942
Efficient optimistic exploration in linear-quadratic regulators via Lagrangian relaxation40Marc Abeille (Offspring)12840
Reinforcement learning with prototypical representations210Denis Yarats (Offspring)190209
Mastering visual continuous control: Improved data-augmented reinforcement learning259Denis Yarats (Offspring)178258
A provably efficient sample collection strategy for reinforcement learning18Jean Tarbouriech (Offspring)1618
Stochastic shortest path: Minimax, parameter-free and towards horizon-free regret34Jean Tarbouriech (Offspring)328
Reinforcement learning in linear MDPs: Constant regret and representation selection19Matteo Papini (Offspring)619
Adaptive multi-goal exploration4Jean Tarbouriech (Offspring)14
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ganeshbabu, B.; Liu, X.; Matono, A.; Kim, K.-S.; Shen, X. Scientific Impact Prediction via Virtual Geography Hawkes Process. Appl. Sci. 2026, 16, 2085. https://doi.org/10.3390/app16042085

AMA Style

Ganeshbabu B, Liu X, Matono A, Kim K-S, Shen X. Scientific Impact Prediction via Virtual Geography Hawkes Process. Applied Sciences. 2026; 16(4):2085. https://doi.org/10.3390/app16042085

Chicago/Turabian Style

Ganeshbabu, Babusurya, Xin Liu, Akiyoshi Matono, Kyoung-Sook Kim, and Xun Shen. 2026. "Scientific Impact Prediction via Virtual Geography Hawkes Process" Applied Sciences 16, no. 4: 2085. https://doi.org/10.3390/app16042085

APA Style

Ganeshbabu, B., Liu, X., Matono, A., Kim, K.-S., & Shen, X. (2026). Scientific Impact Prediction via Virtual Geography Hawkes Process. Applied Sciences, 16(4), 2085. https://doi.org/10.3390/app16042085

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop