Next Article in Journal
Neighbor-Enhanced Link Prediction in Bipartite Networks
Previous Article in Journal
A Stochastic Approach to Generalized Modularity Based Community Detection
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Coarse-Grained Hawkes Processes

Department of Interdisciplinary Statistical Mathematics, The Institute of Statistical Mathematics, Tokyo 190-8562, Japan
Entropy 2025, 27(6), 555; https://doi.org/10.3390/e27060555
Submission received: 21 April 2025 / Revised: 15 May 2025 / Accepted: 23 May 2025 / Published: 25 May 2025
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
When analyzing real-world event data, it is often the case that bin-count processes are observed instead of precise event time-stamps along a continuous timeline, owing to practical limitations in measurement accuracy. In this work, we propose a modeling framework for aggregated event data generated by multivariate Hawkes processes. The introduced model, termed the coarse-grained Hawkes process, effectively captures the second-order statistical characteristics of the bin-count representation of the Hawkes process, particularly when the bin size is large relative to the typical support of the excitation kernel. Building upon this model, we develop a method for inferring the underlying Hawkes process from bin-count observations, and demonstrate through simulation studies that the proposed approach performs comparably to, or even surpasses, existing techniques, while maintaining computational efficiency in parameter estimation.

1. Introduction

Generally, past events in natural and social systems facilitate the occurrence of future events through a self-exciting mechanism. The Hawkes process [1,2], a class of self-exciting point processes, has become a popular tool for modeling such processes continuously. Its defining intensity function, or instantaneous event rate, is composed of a baseline rate augmented by the cumulative influence of prior events, thereby capturing the self-exciting mechanism wherein each event elevates the probability of subsequent occurrences. Owing to the prevalence of self-excitation across diverse domains, the Hawkes process has been extensively employed in a wide range of disciplines, including seismology [3,4], neurophysiology [5,6], genomics [7], finance [8,9], social media analytics [10,11,12], criminology [13,14], terrorism studies [15], and traffic incident analysis [16].
The Hawkes process is applicable to scenarios wherein individual events are distinguishable, as it characterizes a series of discrete occurrences along a continuous timeline. Nevertheless, it proves insufficient for modeling count-based sequences in which the precise timing of events is unobserved and only their aggregated counts within successive intervals are available. A representative case arises in epidemiological studies, where individual infection events are not monitored in real time, but rather, the daily incidence rates are recorded. Consequently, there is a need to develop a count-based time series model that retains the salient features of the Hawkes process for such contexts.
Multiple methodologies have been proposed for fitting the Hawkes process to bin-count data. Kirchner demonstrated that the distribution of bin-count sequences generated by Hawkes processes can be effectively approximated using an integer-valued autoregressive (INAR) model, wherein conditional least squares estimation is employed to infer the underlying Hawkes process [17,18]. Shlomovich et al. introduced a Monte Carlo expectation-maximization (MC-EM) framework, which integrates an efficient sampling algorithm for the latent event times with an EM procedure to maximize the likelihood function [19,20]. Similarly, Chen et al. developed a Pseudo-Marginal Metropolis–Hastings (PMMH) algorithm for maximum likelihood estimation, wherein the likelihood is approximated via a sequential Monte Carlo approach [21]. Alternatively, Cheysson and Lang advocated for a spectral estimation technique grounded in the Whittle likelihood within the univariate context [22].
Although Kirchner’s approach provides consistent and asymptotically normal estimators for the underlying Hawkes process as the bin width approaches zero, it introduces bias when the bin size exceeds the typical support of the excitation function, due to the INAR model’s neglect of intra-bin excitation dynamics. In contrast, the MC-EM, PMMH, and spectral estimation methods yield less biased results by accounting for excitation effects occurring within each bin. Nonetheless, the MC-EM and PMMH techniques are computationally intensive, whereas the spectral method is validated only in the univariate setting.
Accordingly, the objective of this study is to develop a computationally efficient methodology that extends naturally to the multivariate setting. Our proposed approach involves constructing a count-based time series model that approximates the bin-count sequence generated by Hawkes processes, upon which an estimation procedure is built. Unlike Kirchner’s method, which relies on a straightforward discretization, namely, evaluating the intensity function at discrete time points, to relate the Hawkes and INAR processes [17], our model is derived through a coarse graining procedure. The resulting framework, termed the coarse-grained Hawkes process, is defined in a conceptually simple and analytically tractable form, while effectively preserving the second-order statistical characteristics of the bin-count Hawkes process, even in regimes where the bin size is large relative to the excitation function’s effective range.
The structure of this paper is as follows. Section 2 provides a concise overview of Hawkes processes. Section 3 introduces the coarse-grained Hawkes process along with the proposed estimation methodology. In Section 4, we assess the approximation accuracy of the coarse-grained Hawkes process in representing the bin-count Hawkes process. Additionally, a simulation study is conducted to benchmark the performance of the proposed estimation technique against existing approaches. Section 5 concludes with a discussion of the findings.

2. Review of Hawkes Processes

In this section, we present a brief review of Hawkes processes. Let { t i } i N denote a sequence of nonnegative random variables representing the occurrence times of events on R + , satisfying t i < t i + 1 for i N . Define the associated counting process by N ( t ) = i N 1 t i t , and let H t = { N ( u ) : u < t } denote the history of events up to, but not including, time t. We consider a point process such that
P ( N ( t + δ t ) N ( t ) = 1 H t ) = λ ( t ) δ t + o ( δ t ) , P ( N ( t + δ t ) N ( t ) > 1 H t ) = o ( δ t ) ,
as δ t 0 , where λ ( t ) is the conditional intensity function that uniquely characterizes the point process. A univariate Hawkes process is then defined by the conditional intensity function
λ ( t ) = μ + 0 t ϕ ( t u ) d N ( u ) ,
where μ 0 is the baseline intensity and ϕ ( · ) is a nonnegative excitation kernel satisfying ϕ ( t ) = 0 for t < 0 [1]. This formulation captures the self-exciting nature of the process, wherein the intensity at any given time depends on the historical sequence of events. The branching ratio is defined as
α : = 0 ϕ ( t ) d t ,
which quantifies the expected number of subsequent events triggered by a single occurrence. The process admits a stationary distribution provided that α < 1 .
The univariate Hawkes process naturally generalizes to the multivariate case [1,2]. For d N , a d-dimensional Hawkes process N ( t ) = ( N 1 ( t ) , , N d ( t ) ) T comprises d jointly defined point processes on R + . The corresponding vector-valued conditional intensity function is given by
λ ( t ) = μ + 0 t Φ ( t u ) d N ( u ) ,
where μ = ( μ 1 , , μ d ) T R 0 d is the baseline intensity vector, and the excitation kernel Φ = ( ϕ i j ) 1 i , j d is a matrix-valued function with nonnegative entries satisfying ϕ i j ( t ) = 0 for t < 0 . Each entry ϕ i j ( · ) describes the influence of process j on the intensity of process i, thus capturing both self-excitation and mutual excitation among the components. The multivariate Hawkes process is asymptotically stationary if the spectral radius of the branching matrix
A : = 0 Φ ( t ) d t
is strictly less than one.
Finally, we briefly recall the martingale properties of point processes that are instrumental for our methodological developments. For a comprehensive treatment, see [23]. Define the process M ( t ) = N ( t ) Λ ( t ) , where Λ ( t ) = 0 t λ ( u ) d u . Then, M ( t ) is a martingale with respect to the filtration H t , satisfying the property that the conditional expectation of its increment is zero:
E M ( t ) M ( s ) H s = 0 , t > s .
Moreover, the conditional covariance matrix of the martingale increment is given by
E ( M ( t ) M ( s ) ) ( M ( t ) M ( s ) ) T H s = diag E [ Λ ( t ) Λ ( s ) H s ] , t > s ,
which follows from the quadratic variation of the martingale.

3. Coarse-Grained Hawkes Process

3.1. Motivation

We consider a scenario wherein the exact timing of individual events is unobserved; instead, we observe an aggregated count of these latent continuous-time events within discrete time intervals. We define
X n = N ( n Δ t ) N ( ( n 1 ) Δ t ) , n N ,
as the bin-wise event counts over intervals of size Δ t . When { N ( t ) : t R + } constitutes a Hawkes process, the corresponding discrete-time vector process { X n : n N } is referred to as the binned Hawkes process [19,20].
Although a closed-form representation of the probability distribution of binned Hawkes processes is not available, an approximate formulation can be derived by discretizing the conditional intensity function as follows:
λ n = ( λ 1 n , , λ d n ) T = μ Δ t + k = 1 n 1 Φ ( k Δ t ) Δ t X k ,
and assuming that the bin counts follow Poisson distributions:
X n X 1 , , X n 1 j = 1 d Poisson ( λ j n ) .
This method, known as the binned Poisson approximation, converges in distribution to the true Hawkes process as Δ 0 . However, its accuracy degrades with larger Δ t due to discretization errors in (1) and the conditional independence assumption in (2), which neglects the intra-bin excitation effect. Nonetheless, it facilitates a tractable estimation procedure: the log-likelihood of the sequence X 1 , , X n is given by
log P ( X 1 , , X n ) = k = 1 n j = 1 d X j k log λ j k λ j k .
Parameter estimation is then performed via maximization of this log-likelihood function [19,20,24].
In light of the limitations of this approximation for larger Δ t , we propose an alternative count time series model that more accurately approximates the binned Hawkes process while retaining computational simplicity for inference. The underlying heuristic is elucidated in the following discussion.
The central idea is to replace the crude discretization with the expected value of the conditional intensity integrated over each time bin. The expected count within the nth bin of process i is ( n 1 ) Δ t n Δ t λ i ( t ) d t , which depends on the trajectory { N ( u ) : 0 < u n Δ t } . Consequently, we consider the conditional expectation, given the bin counts { X 1 , , X n } :
λ i n = E ( n 1 ) Δ t n Δ t λ i ( t ) d t X 1 , , X n = μ i Δ t + k = 1 n 1 j = 1 d E ( n 1 ) Δ t n Δ t ( k 1 ) Δ t k Δ t ϕ i j ( t u ) d N j ( u ) d t X j k + j = 1 d E ( n 1 ) Δ t n Δ t ( n 1 ) Δ t t ϕ i j ( t u ) d N j ( u ) d t X j n .
As these conditional expectations do not admit closed-form expressions, we approximate them under the assumption that event times are uniformly distributed within each bin. Accordingly, the second term on the right-hand side of (3) is approximated as
E ( n 1 ) Δ t n Δ t ( k 1 ) Δ t k Δ t ϕ i j ( t u ) d N j ( u ) d t X j k ( k 1 ) Δ t k Δ t ( k 1 ) Δ t k Δ t 1 Δ t X j k i = 1 X j k ( n 1 ) Δ t n Δ t ϕ i j ( t t i ) d t d t 1 d t X j k = ϕ n k i j X j k ,
where ϕ n k i j denotes the coarse-grained kernel, defined as
ϕ n k i j = 1 Δ t ( k 1 ) Δ t k Δ t ( n 1 ) Δ t n Δ t ϕ i j ( t u ) d u d t .
Similarly, the third term on the right-hand side of (3) is approximated by
E ( n 1 ) Δ t n Δ t ( n 1 ) Δ t t ϕ i j ( t u ) d N j ( u ) d t X j n = E ( n 1 ) Δ t n Δ t ( n 1 ) Δ t n Δ t ϕ i j ( t u ) d N j ( u ) d t X j n ϕ 0 i j X j n ,
where we have utilized the causal property ϕ i j ( t ) = 0 for t < 0 . Substituting these approximations back into (3) yields
λ i n μ i Δ t + k = 1 n j = 1 d ϕ n k i j X j k .
In contrast to the formulation in (1), Equation (5) incorporates the coarse-grained kernel, capturing both inter-bin excitation and intra-bin self-excitation effects through the terms ϕ 0 i j X j n ( 1 i , j d ). Building upon this approximation, we will formally define the coarse-grained Hawkes process in the subsequent section.

3.2. Definition

Firstly, we formally define the coarse-grained kernel, previously derived heuristically in (4).
Definition 1.
Let ϕ ( t ) be a nonnegative excitation kernel defined on the real line, satisfying ϕ ( t ) = 0 for t < 0 . The coarse-grained kernel { ϕ k } k N 0 with bin size Δ t > 0 is defined by
ϕ k = ξ 0 , k = 0 , ξ k ξ k 1 , k = 1 , 2 , ,
where
ξ k = 1 Δ t k Δ t ( k + 1 ) Δ t 0 t ϕ ( u ) d u d t .
It can be readily verified that the formulation (6) and (7) coincides with the expression in (4).
Lemma 1.
The coarse-grained kernel satisfies
k = 0 ϕ k = 0 ϕ ( u ) d u = : α .
Moreover, it holds that ϕ 0 < α .
Proof. 
See Appendix A.1.    □
Lemma 1 guarantees that the total mass of the coarse-grained kernel equals the integral of the excitation kernel, i.e., the branching ratio. Furthermore, it ensures that the kernel does not collapse to the degenerate case ϕ 0 = α as long as Δ t < .
Based on the coarse-grained kernel, we introduce the coarse-grained Hawkes process. Consider a probability space equipped with a sequence of d-dimensional, integer-valued random vectors { X ˜ k } k Z . Define the history up to bin k as H k = { X ˜ j : j k } . We consider the d-dimensional coarse-grained process given in (5):
λ n = μ Δ t + k n Φ n k X ˜ k , n Z ,
where μ R 0 d is the baseline intensity vector, and each Φ n k = ( ϕ n k i j ) 1 i , j d is a matrix whose elements are given by the coarse-grained kernels for bin size Δ t . Define the residual process as
Δ M n = X ˜ n λ n , n Z ,
and impose the following conditional moment properties, analogous to the martingale conditions in point process theory:
E [ Δ M n H n 1 ] = 0 ,
E [ Δ M n Δ M n T H n 1 ] = diag ( E [ λ n H n 1 ] ) .
Importantly, these conditional moment properties do not fully specify the probability law of the process, but constrains its behavior through second-order statistical structure. Consequently, multiple processes may exist that fulfill these conditions. The coarse-grained Hawkes process is thus defined as the equivalence class of sequences { X ˜ k } satisfying (8)–(11).
The term Φ 0 X ˜ n on the right-hand side of (8) encapsulates the effect of intra-bin excitation, which induces both cross-correlations and overdispersion in the count statistics. Assuming that the spectral radius of Φ 0 is strictly less than one, the conditional expectation and covariance matrix of X ˜ n given H n 1 are, respectively
E [ X ˜ n H n 1 ] = ( I Φ 0 ) 1 μ Δ t + k n 1 Φ n k X ˜ k = : λ n ,
Var ( X ˜ n H n 1 ) = ( I Φ 0 ) 1 diag ( λ n ) ( I Φ 0 T ) 1 ,
where I denotes the d × d identity matrix. Derivations of these expressions are provided in Appendix A.2. Due to the presence of the matrix factor ( I Φ 0 ) 1 , the conditional covariance matrix is generally nondiagonal, capturing cross-correlations, and its diagonal elements exceed those of λ n , reflecting overdispersion. When Φ 0 is the zero matrix, implying the absence of intra-bin excitation, the conditional variance reduces to Var ( X ˜ n H n 1 ) = diag ( λ n ) , corresponding to the conditional independence of Poisson-distributed counts.

3.3. Stationary Process

We investigate the coarse-grained Hawkes process under the assumption of stationarity and derive its second-order statistical properties. Assuming stationarity, Equations (8) and (10) yield
E [ X ˜ n ] = ( I A ) 1 μ Δ t = : λ , n Z ,
where
A : = n = 0 Φ n
denotes the branching ratio matrix. Consequently, the spectral radius of A must be strictly less than one.
To derive the second-order moment of the stationary coarse-grained Hawkes process, we first establish a white noise property of the residual process.
Lemma 2.
Let { X ˜ n } denote a coarse-grained Hawkes process whose branching ratio matrix has a spectral radius less than one. Then, the residual process (9) forms a stationary sequence satisfying E [ Δ M n ] = 0 , n Z , and
E [ Δ M n Δ M n T ] = δ n n diag ( λ ) , n , n Z ,
where δ n n denotes the Kronecker delta.
Proof. 
See Appendix A.3.    □
As a consequence of the white noise structure of the residual process, the stationary coarse-grained Hawkes process admits a moving average representation of infinite order,
X ˜ n λ = k = 0 Ψ k Δ M n k ,
where the effective kernel matrices Ψ k are defined by
Ψ k = j = 0 ( Φ ( j ) ) k ,
with Φ ( j ) denoting the j-fold convolution of Φ , and ( Φ ( 0 ) ) k = δ k 0 I . It is worth noting that the branching ratio matrix can be expressed in terms of the effective kernel matrices as follows:
A = I n = 0 Ψ n 1 ,
thereby establishing a relationship between the branching ratio and the cumulative influence of preceding events. The detailed derivation is presented in Appendix A.4. Using the representation (16), the autocovariance structure is obtained analogously to linear time series models.
Theorem 1.
Let { X ˜ n } be a coarse-grained Hawkes process with a branching ratio matrix whose spectral radius is less than one. Then, the autocovariance of { X ˜ n } is given by
R j : = Cov ( X ˜ n , X ˜ n + j ) = l = 0 Ψ l j diag ( λ ) Ψ l T , j Z .
Proof. 
See Appendix A.5.    □
The spectral density matrix of the stationary coarse-grained Hawkes process is obtained by taking the Fourier transform of the autocovariance sequence:
F Δ t ( c g ) ( ω ) = 1 2 π n = R n e i ω n = 1 2 π Ψ ^ ( ω ) diag ( λ ) Ψ ^ T ( ω ) ,
where Ψ ^ ( ω ) = n = 0 Ψ n e i ω n is the Fourier transform of the effective kernel matrix. Alternatively, it can be expressed via the Fourier transform of the coarse-grained kernel matrix, Φ ^ ( ω ) = n = 0 Φ n e i ω n , as
F Δ t ( c g ) ( ω ) = 1 2 π ( I Φ ^ ( ω ) ) 1 diag ( λ ) ( I Φ ^ T ( ω ) ) 1 ,
utilizing the identity Ψ ^ ( ω ) = ( I Φ ^ ( ω ) ) 1 .
In summary, under the condition that the spectral radius of the branching ratio matrix is less than one, the expected value of the process remains constant over time, and its autocovariance depends solely on the lag between observations, not their absolute positions in time. Accordingly, the coarse-grained Hawkes process is weakly stationary.

3.4. Approximation to Hawkes Process

We now establish a rigorous connection between the original Hawkes process and its coarse-grained counterpart. Specifically, we examine the second-order statistical properties of both processes under the assumption of stationarity. A summary of the statistical properties of the stationary Hawkes process is provided in Appendix B.
Since the sum of the coarse-grained kernel coincides with the integral of the excitation kernel (see Lemma 1), the branching ratio matrices, and consequently, the conditions for stationarity, are identical for both processes. Under stationarity, the expected event counts of the coarse-grained Hawkes process (14) coincide with those of the original Hawkes process (A7).
We proceed to compare the spectral density matrices of the two processes, focusing in particular on the convergence behavior of the spectral density of the coarse-grained process toward that of the original Hawkes process.
Theorem 2.
The spectral density matrix of the coarse-grained Hawkes process satisfies
F Δ t ( c g ) ( ν Δ t ) = F ( h w ) ( ν ) Δ t + O ( Δ t 3 ) ,
as Δ t 0 , where F ( h w ) ( ν ) denotes the spectral density matrix (A6) of the original Hawkes process.
Proof. 
See Appendix A.6.    □
Considering the binned Hawkes process, its spectral density matrix F Δ t ( h w ) ( ω ) behaves as (see Appendix B)
F Δ t ( h w ) ( ν Δ t ) = F ( h w ) ( ν ) Δ t + O ( Δ t 3 ) ,
as Δ t 0 , which matches the expansion in (18) up to second-order terms in Δ t . This leads to the following corollary.
Corollary 1.
The spectral density matrix of the coarse-grained Hawkes process approximates that of the binned Hawkes process to third-order accuracy as
F Δ t ( c g ) ( ω ) = F Δ t ( h w ) ( ω ) + O ( Δ t 3 ) ,
as Δ t 0 .
For comparative purposes, consider the binned Poisson approximation defined by Equations (1) and (2). Its spectral density matrix F Δ t ( p o ) ( ω ) behaves as (see Appendix A.6)
F Δ t ( p o ) ( ω ) = F Δ t ( h w ) ( ω ) + O ( Δ t 2 ) .
Therefore, the coarse-grained Hawkes process yields a spectral approximation to the binned Hawkes process that is accurate to a higher order than the binned Poisson approximation.

3.5. Parameter Estimation Method

We address the problem of estimating the parameters of a Hawkes process from binned count data. Assume that we observe a sequence of binned event counts { X 1 , , X n } of length n generated by a d-dimensional Hawkes process, whose excitation kernel matrices Φ k ( θ ) are specified by a parametric form with an unknown parameter vector θ . Since the likelihood function of the coarse-grained Hawkes process is unavailable due to the absence of a fully specified probability law, we propose a parameter estimation method for θ grounded in the second-order statistical properties of the coarse-grained Hawkes process.
To this end, we utilize the AR() representation of the coarse-grained Hawkes process (see Appendix A.4),
X ˜ n λ = k = 1 Φ k ( θ ) ( X ˜ n k λ ) + Δ M n ( θ ) ,
where Φ k ( θ ) = ( I Φ 0 ( θ ) ) 1 Φ k ( θ ) , and Δ M n ( θ ) = ( I Φ 0 ( θ ) ) 1 Δ M n ( θ ) is a zero-mean, cross-correlated white noise sequence satisfying
E [ Δ M n ( θ ) Δ M n ( θ ) T ] = δ n n ( I Φ 0 ( θ ) ) 1 diag ( λ ) ( I Φ 0 ( θ ) T ) 1 = : δ n n Λ ( θ ) , n , n Z .
Based on this representation, we define a loss function in quadratic form, whose minimizer yields an estimator of the parameter vector:
L n ( θ ) = k = 1 n Y k j = 1 k 1 Φ j ( θ ) Y k j T Λ 1 ( θ ) Y k j = 1 k 1 Φ j ( θ ) Y k j + n log | Λ ( θ ) | ,
where Y k = X k λ . In practice, the unknown mean vector λ is replaced by the empirical mean λ ^ = n 1 k = 1 n X k . The loss function then simplifies to
L n ( θ ) = k = 1 n Y k j = 0 k 1 Φ j ( θ ) Y k j T diag ( λ ^ ) 1 Y k j = 0 k 1 Φ j ( θ ) Y k j 2 n log | I Φ 0 ( θ ) | ,
where the constant term log diag ( λ ^ ) has been omitted. The first term on the right-hand side of (20) corresponds to a weighted quadratic loss, while the second term serves as a regularization component that discourages trivial solutions of the form Φ k ( θ ) = δ k 0 I . The optimal parameter estimate is obtained by minimizing the loss function with respect to θ , which can be efficiently carried out using a gradient-based optimization algorithm.
The estimation procedure is summarized as shown in Algorithm 1.
Algorithm 1 Estimation procedure for Hawkes processes
1:
Given the observed bin-count sequence { X 1 , , X n } , compute the empirical mean λ ^ = n 1 k = 1 n X k and center the data as Y k = X k λ ^ for k = 1 , , n .
2:
Determine the parameter estimate θ ^ by minimizing the loss function (20).
3:
Using the estimated parameter vector θ ^ , compute the estimate of the baseline intensity as
μ ^ = ( I A ^ ) λ ^ / Δ ,
where A ^ = n = 0 Φ n ( θ ^ ) denotes the estimated branching ratio matrix.

4. Numerical Experiments

4.1. Assessment of Second-Order Characteristics

As established in Corollary 1, the coarse-grained Hawkes process asymptotically approximates the spectral density matrix of the binned Hawkes process as the bin size Δ t 0 . In this section, we conduct numerical investigations to evaluate the validity and robustness of this approximation for increasing values of Δ t . To this end, we consider a bivariate Hawkes process ( d = 2 ), where each component of the excitation kernel matrix is defined as
ϕ i j ( t ) = α i j g i j ( t ) , t 0 ,
with g i j ( t ) denoting a normalized kernel satisfying 0 g i j ( t ) d t = 1 . The coefficient α i j specifies the branching ratio from component j to component i, and g i j ( t ) characterizes the distribution of waiting times for event excitation. We specifically focus on a symmetric bivariate Hawkes process where the parameters satisfy μ 1 = μ 2 = μ , g i j ( t ) = g ( t ) ( 1 i , j 2 ), and the excitation kernel matrix takes the form
α 11 α 12 α 21 α 22 = α ( s ) α ( c ) α ( c ) α ( s ) .
The stationarity condition for the process holds if α ( s ) + α ( c ) < 1 . Under this constraint, the second-order statistical structure of the symmetric Hawkes process is described by the power spectral density (PSD) of each component (the diagonal elements of the spectral density matrix) and the cross-spectral density (CSD) between them (the off-diagonal elements).
We illustrate our findings using an exponential kernel defined as
g ( t ) = β e β t , t 0 ,
where β 1 denotes the expected waiting time before excitation. Figure 1 presents the PSD and CSD, along with the auto- and cross-covariance functions, of the binned Hawkes process with parameters α ( s ) = 0.4 , α ( c ) = 0.3 , and β = 1 , for bin sizes Δ t = 0.1 (a), 1 (b), and 2 (c). These are depicted using blue dotted lines.
The associated coarse-grained Hawkes process is constructed using the coarse-grained kernel derived from the exponential function:
g k = 1 ( 1 e β Δ t ) / β Δ t , k = 0 , ( e β Δ t + e β Δ t 2 ) e β k Δ t / β Δ t , k = 1 , 2 ,
In the same figures, the four second-order statistics of the coarse-grained process are represented by red lines, while those corresponding to the binned Poisson approximation are shown in green.
From these comparisons, we observe that for small bin sizes ( Δ t = 0.1 < β 1 ), both the coarse-grained Hawkes process and the binned Poisson approximation closely reproduce the second-order behavior of the binned Hawkes process (Figure 1a). However, for Δ t = 1 , which is comparable to the mean waiting time, the Poisson approximation significantly deteriorates (Figure 1b), whereas the coarse-grained Hawkes process continues to provide a high-fidelity approximation. Even for a larger bin size Δ t = 2 , exceeding the characteristic time scale β 1 , the coarse-grained model remains accurate (Figure 1c).
To quantitatively assess the fidelity of the approximations, we introduce a divergence measure based on the log-determinant of the spectral density matrices. Specifically, let F Δ t ( ) ( ω ) denote the spectral density matrix of the approximating process, where ‘ ’ indicates either the coarse-grained (cg) or Poisson (po) approximation. Then, the information loss relative to the binned Hawkes process F Δ t ( h w ) ( ω ) is defined as
Δ h ( ) = 1 4 π π π log | F Δ t ( h w ) ( ω ) | | F Δ t ( ) ( ω ) | d ω ,
which corresponds to the gap in maximum entropy rates under spectral constraints.
Figure 2 plots the information loss for both approximations as a function of Δ t , across varying values of α ( s ) and α ( c ) . The results clearly indicate that the coarse-grained Hawkes process incurs minimal information loss across a wide range of parameter settings. In contrast, the accuracy of the binned Poisson approximation degrades with increasing Δ t , with the loss exacerbated further as α ( s ) and α ( c ) increase.
We additionally examined the case where the excitation kernel follows a power-law distribution instead of an exponential decay. The qualitative behavior remained consistent (see Appendix C; Figure A1 and Figure A2). These findings collectively demonstrate that the coarse-grained Hawkes process offers a substantially improved approximation of the second-order dynamics of the binned Hawkes process, particularly for larger bin widths.

4.2. Parameter Estimation

We now investigate the efficacy of the proposed estimation method in inferring the parameters of a Hawkes process from bin-count data. Specifically, we consider an asymmetric bivariate Hawkes process with exponential excitation kernels given by g i j ( t ) = β i j e β i j t ( 1 i , j 2 ). The parameters of the Hawkes process are set as follows:
μ 1 μ 2 = 1 1 , α 11 α 12 α 21 α 22 = 0.4 0.5 0.3 0.2 , β 11 β 12 β 21 β 22 = 0.5 0.7 0.3 1.0 .
The numerical experiments were conducted as follows. First, realizations of the Hawkes process were generated over the interval [ 0 , T ] and subsequently discretized into bin-count sequences using bin size Δ t . The ten model parameters { μ i , α i j , β i j } 1 i , j 2 were then estimated from these sequences. Parameter estimation was performed by minimizing the loss function (20) using the quasi-Newton method (BFGS), with finite difference approximation employed for gradient evaluation.
To assess the performance of our proposed method, we compare it against three established approaches. The first is the MC-EM algorithm proposed in [20], for which we employed the publicly available implementation [25]. To ensure a fair comparison, we adopted the tuning parameters specified in [20], setting the number of Monte Carlo samples to 10. The second method is maximum likelihood estimation (MLE) applied to the binned Poisson approximation, with the MLE computed using the quasi-Newton method (BFGS). The third method involves conditional least squares estimation for the INAR(p) process, as introduced in [18]. Since the INAR(p) framework yields nonparametric estimates of the excitation kernels, we obtained the corresponding parametric kernel parameters by fitting an exponential function to the nonparametric estimates [19,20].
Figure 3 presents boxplots of the estimated values for each of the ten parameters across 500 simulated realizations of the binned Hawkes process, using T = 1000 and Δ t = 2 . It is apparent that both the binned Poisson MLE method and the INAR(p) method produce significantly biased estimates, which is expected given that these approaches disregard excitation effects within each bin. Furthermore, we observe a substantial number of outliers in the estimates of β i j across all four methods, indicating that estimation of the kernel scales exhibits higher variance than estimation of the baseline intensities or branching ratios. For instance, our method yielded estimates of β 22 that deviated by a factor of 100 from the true value in 16 out of the 500 trials, highlighting the challenges in accurately estimating β i j when the bin size is large relative to the kernel time scale.
Figure 4 displays the root mean squared error (RMSE) of the parameter estimates across various bin sizes. For the RMSE of β ^ i j , extreme outliers with values exceeding 100 were excluded to mitigate their undue influence. Overall, the proposed method demonstrates superior performance compared to both the binned Poisson MLE and the INAR(p) approach, and achieves accuracy comparable to the MC-EM algorithm. Additionally, both the MC-EM and the proposed methods consistently maintain low RMSE values across different bin sizes, whereas the RMSEs for the binned Poisson MLE and the INAR(p) approach increase with larger bin sizes. A similar trend is observed for β ^ i j , albeit with more fluctuation due to outliers.
Figure 5 and Figure 6 illustrate, respectively, the bias and standard deviation components of the RMSE for each of the ten parameters. The proposed method yields the lowest bias among all methods, with only a few exceptions (Figure 5); meanwhile, the MC-EM algorithm attains the lowest standard deviation (Figure 6).
We further investigated the case in which the excitation kernel follows a power-law distribution rather than an exponential decay, thereby testing a different (non-memoryless) kernel. Our results confirm that the proposed method remains effective in this setting (Appendix C; Figure A3, Figure A4, Figure A5 and Figure A6).
In conclusion, the proposed method matches the performance of the MC-EM algorithm while outperforming both the binned Poisson MLE and the INAR(p) method, particularly in terms of bias reduction. It provides robust and stable estimates for the baseline intensities μ i and the branching ratios α i j , with estimation accuracy largely unaffected by bin size. Accurate estimation of the kernel scales β i j is feasible when the bin size is smaller than the characteristic kernel scale, but becomes unreliable as the bin size increases.

4.3. Choice of Parametric Form of Excitation Kernel

To investigate whether the statistical properties of the coarse-grained Hawkes process are influenced by the specific choice of parametric kernel function, we considered four probability density functions (PDFs): gamma, power-law, log-normal, and Weibull. Figure 7 illustrates the PDFs of these four distributions, all of which share a common mean and standard deviation, with their coarse-grained counterparts. It is evident that as the bin size Δ t increases, the coarse-grained kernels converge and become indistinguishable from one another, as the detailed shape of the distributions is averaged out. This observation suggests that our method is robust to the parametric form of the excitation kernel when the bin size is large relative to the kernel timescale.

5. Discussion

In this study, we introduced the coarse-grained Hawkes process as an analytical approximation to the binned Hawkes process. Unlike conventional discretization techniques, the proposed framework incorporates a coarse-grained excitation kernel that systematically accounts for intra-bin excitations. Consequently, the coarse-grained Hawkes process faithfully reproduces the second-order statistical properties of the binned Hawkes process, even when the bin size exceeds the characteristic timescale of the excitation kernel. Moreover, we demonstrated that the proposed approach enables stable estimation of Hawkes process parameters from bin-count data. In particular, both the branching ratios and baseline intensities can be reliably inferred, irrespective of the temporal resolution of the bin-count sequences.
A central distinction between our approach and the Monte Carlo Expectation-Maximization (MC-EM) algorithm lies in the treatment of latent event times within the bin-count data. Whereas the MC-EM method necessitates Monte Carlo sampling from the conditional distribution of unobserved events, resulting in considerable computational burden, our method employs a parsimonious assumption that events are uniformly distributed within each bin. This assumption facilitates the analytical derivation of approximate conditional expectations. Despite its simplicity, the proposed method achieves estimation accuracy on par with that of the MC-EM algorithm while offering substantial computational advantages.
An additional strength of our approach is its robustness to the parametric form of the excitation kernel as the bin size increases. As the detailed shape of the kernel becomes averaged out, the parametric specification becomes largely irrelevant. Consequently, for large bin sizes relative to the kernel timescale, our method remains effective irrespective of the precise functional form of the excitation kernel.
We further highlight a theoretical connection between our estimation framework and the spectral method, previously validated in the univariate setting [22]. Applying the Fourier transform, the loss function in Equation (19) asymptotically approximates a spectral likelihood for large n:
L n ( θ ) m = 1 n X ^ m F Δ t ( c g ) ( ω m ) 1 X ^ m + log | F Δ t ( c g ) ( ω m ) | ,
where ω m = 2 π m / n ,
X ^ m = 1 2 π n k = 1 n ( X ˜ k λ ) e 2 π i k m / n ,
and X ^ m denotes the Hermitian transpose. Notably, the spectral likelihood for the multivariate binned Hawkes process can be obtained by replacing F Δ t ( c g ) ( ω m ) with the spectral density matrix of the binned Hawkes process.
Although our analysis focused on temporal Hawkes processes, the proposed modeling framework readily extends to space-time Hawkes processes. By discretizing both the temporal and spatial domains and counting the number of events in each bin, one obtains multivariate bin-count sequences to which the coarse-grained Hawkes process can be applied. In this extension, the coarse-graining procedure must be conducted in both time and space.
Finally, we emphasize the potential extension of the proposed framework to nonstationary time series. Owing to its formulation in the time domain, the coarse-grained Hawkes process is amenable to integration within state-space modeling paradigms [26,27], offering a promising direction for modeling nonstationary dynamics. We propose this as a compelling avenue for future research.

Funding

This work was funded by the Japan Society for the Promotion of Science (JSPS) KAKENHI Grant Number JP22H03695.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author declares no conflicts of interest.

Appendix A. Proofs

Appendix A.1. Proof of Lemma 1

It follows from Equations (6) and (7) that
k = 0 ϕ k = lim n ξ 0 + ( ξ 1 ξ 0 ) + + ( ξ n ξ n 1 ) = lim n ξ n = lim n 1 Δ t n Δ t ( n + 1 ) Δ t 0 t ϕ ( u ) d u d t = lim n 1 Δ t 0 Δ t 0 n Δ t + s ϕ ( u ) d u d s .
Since 0 n Δ t + s ϕ ( u ) d u is nonnegative and monotonically nondecreasing with respect to n, we may invoke the dominated convergence theorem to interchange the limit and the integral, yielding
k = 0 ϕ k = 1 Δ t 0 Δ t lim n 0 n Δ t + s ϕ ( u ) d u d s = 1 Δ t 0 Δ t 0 ϕ ( u ) d u d s = α .
Furthermore, since 0 t ϕ ( u ) d u is continuous, nonnegative, and monotonically nondecreasing in t, we obtain
ϕ 0 = 1 Δ t 0 Δ t 0 t ϕ ( u ) d u d t < 1 Δ t 0 Δ t 0 ϕ ( u ) d u d t = α .

Appendix A.2. Derivation of (12) and (13)

From Equations (8) and (10), we have
E [ X ˜ n H n 1 ] = E μ Δ t + k n Φ n k X ˜ k H n 1 = μ Δ t + k n 1 Φ n k X ˜ k + Φ 0 E [ X ˜ n H n 1 ] .
( I Φ 0 ) E [ X ˜ n H n 1 ] = μ Δ t + k n 1 Φ n k X ˜ k .
Since the spectral radius of Φ 0 is strictly less than one, the matrix ( I Φ 0 ) is invertible, and all entries of E [ X ˜ n H n 1 ] are nonnegative. Hence, the conditional mean vector of X ˜ n is given by expression (12).
To derive (13), observe that
X ˜ n λ n = ( I Φ 0 ) 1 ( I Φ 0 ) X ˜ n μ Δ t k n 1 Φ n k X ˜ k = ( I Φ 0 ) 1 Δ M n .
Therefore,
Var ( X ˜ n H n 1 ) = E ( X ˜ n λ n ) ( X ˜ n λ n ) T H n 1 = ( I Φ 0 ) 1 E Δ M n Δ M n T H n 1 ( I Φ 0 T ) 1 = ( I Φ 0 ) 1 diag ( E [ λ n H n 1 ] ) ( I Φ 0 T ) 1 ,
where the last equality follows from Equation (11). Finally, since E [ λ n H n 1 ] = λ n by (10), we obtain Equation (13).

Appendix A.3. Proof of Lemma 2

From Equation (10), it follows that
E [ Δ M n ] = E E [ Δ M n H n 1 ] = 0 .
To derive Equation (15), observe that for n < n (and, by symmetry, for n n ),
E [ Δ M n Δ M n T ] = E Δ M n E [ Δ M n T H n 1 ] = 0 .
Moreover, applying Equation (11), we obtain
E [ Δ M n Δ M n T ] = E E [ Δ M n Δ M n T H n 1 ] = E diag ( E [ λ n H n 1 ] ) = diag ( λ ) ,
which completes the proof.

Appendix A.4. Ar(∞) and MA(∞) Representations

Using Equations (9) and (14), the rate Equation (8) can be reformulated as
X ˜ n λ = k = 0 Φ k ( X ˜ n k λ ) + Δ M n ,
( I Φ 0 ) ( X ˜ n λ ) = k = 1 Φ k ( X ˜ n k λ ) + Δ M n .
Under the stationarity condition, the inverse ( I Φ 0 ) 1 exists, yielding the autoregressive representation
X ˜ n λ = k = 1 Φ k ( X ˜ n k λ ) + Δ M n ,
where Φ k = ( I Φ 0 ) 1 Φ k and Δ M n = ( I Φ 0 ) 1 Δ M n .
To obtain a moving average representation, we apply the formal z-transform to both sides of Equation (A1):
Y ^ ( z ) = Φ ^ ( z ) Y ^ ( z ) + Δ M ^ ( z ) ,
where Y ^ ( z ) = n = ( X ˜ n λ ) z n , Φ ^ ( z ) = n = Φ n z n and Δ M ^ ( z ) = n = Δ M n z n . Solving for Y ^ ( z ) yields
Y ^ ( z ) = Ψ ^ ( z ) Δ M ^ ( z ) ,
where the transfer function is defined by
Ψ ^ ( z ) = ( I Φ ^ ( z ) ) 1 = j = 0 Φ ^ j ( z ) .
Applying the inverse z-transform then gives the MA() process:
X ˜ n λ = k = 0 Ψ k Δ M n k ,
where Ψ k = j = 0 ( Φ ( j ) ) k . Furthermore, utilizing Equation (A2), and noting that Ψ ^ ( 1 ) = n = 0 Ψ n and Φ ^ ( 1 ) = A , the branching ratio matrix can be expressed in terms of Ψ n as A = I n = 0 Ψ n 1 .

Appendix A.5. Proof of Theorem 1

Using Equations (15) and (16), the autocovariance matrix of { X ˜ n } is given by
Cov ( X ˜ n , X ˜ n + j ) = E [ ( X ˜ n λ ) ( X ˜ n + j λ ) T ] = E k = 0 Ψ k Δ M n k l = 0 Ψ l Δ M n + j l T = k = 0 l = 0 Ψ k E [ Δ M n k Δ M n + j l T ] Ψ l T = k = 0 l = 0 Ψ k diag ( λ ) δ k , l j Ψ l T = l = 0 Ψ l j diag ( λ ) Ψ l T ,
which completes the proof.

Appendix A.6. Proof of Theorem 2

We provide a proof of Theorem 2 in the univariate setting. The extension to the multivariate case follows by applying the same reasoning component-wise to each element of the spectral density matrix. To begin, we establish the following approximation result for the Fourier transforms of the excitation kernel and its coarse-grained counterpart.
Lemma A1.
Let ϕ ˜ ( ν ) = 0 ϕ ( t ) e i ν t d t denote the continuous-time Fourier transform of the excitation kernel, and let ϕ ^ ( ω ) = k = 0 ϕ k e i ω k denote the discrete-time Fourier transform of the corresponding coarse-grained kernel. Then, in the limit as Δ t 0 , the following approximation holds:
ϕ ^ ( ν Δ t ) = ϕ ˜ ( ν ) + O ( Δ t 2 ) .
Proof. 
From (6) and (7), we obtain
ϕ ^ ( ν Δ t ) = k = 0 ϕ k e i ν k Δ t = ξ 0 + ( ξ 1 ξ 0 ) e i ν Δ t + ( ξ 2 ξ 1 ) e 2 i ν Δ t + = lim n k = 0 n 1 ξ k e i ν k Δ t ( 1 e i ν Δ t ) + ξ n e i ν n Δ t = lim n k = 0 n 1 k Δ t ( k + 1 ) Δ t φ ( t ) e i ν k Δ t d t · 1 e i ν Δ t Δ t + ξ n e i ν n Δ t ,
where φ ( t ) = 0 t ϕ ( u ) d u . To evaluate the integral, consider
k Δ t ( k + 1 ) Δ t φ ( t ) e i ν t d t = k Δ t ( k + 1 ) Δ t φ ( t ) e i ν k Δ t i ν e i ν k Δ t ( t k Δ t ) + O ( Δ t 2 ) d t = k Δ t ( k + 1 ) Δ t φ ( t ) e i ν k Δ t d t Q k + O ( Δ t 3 ) ,
where
Q k = i ν e i ν k Δ t 0 Δ t φ ( t + k Δ t ) t d t .
By Taylor’s theorem, for each t [ 0 , Δ t ] , there exists c ( k Δ t , ( k + 1 ) Δ t ) such that
φ ( t + k Δ t ) = φ ( k Δ t ) + φ ˙ ( c ) t = φ ( k Δ t ) + ϕ ( c ) t .
Note that c generally depends on t. Then, Q k is approximated as
Q k = i ν e i ν k Δ t 0 Δ t φ ( k Δ t ) + ϕ ( c ) t t d t = i ν e i ν k Δ t · φ ( k Δ t ) 2 Δ t 2 + O ( Δ t 3 ) .
Substituting into (A5), we obtain
k Δ t ( k + 1 ) Δ t φ ( t ) e i ν k Δ t d t = k Δ t ( k + 1 ) Δ t φ ( t ) e i ν t d t + i ν e i ν k Δ t · φ ( k Δ t ) 2 Δ t 2 + O ( Δ 3 ) .
Therefore, for n 1 and Δ t n 1 , it follows that
k = 0 n 1 k Δ t ( k + 1 ) Δ t φ ( t ) e i ν k Δ t d t = 0 n Δ t φ ( t ) e i ν t d t + i ν Δ t 2 k = 0 n 1 φ ( k Δ t ) e i ν k Δ t Δ t + O ( Δ t 2 ) = 0 n Δ t φ ( t ) e i ν t d t + i ν Δ t 2 0 n Δ t φ ( t ) e i ν t d t + O ( Δ t 2 ) .
Substituting back into (A4) yields
ϕ ^ ( ν Δ t ) = lim n [ 0 n Δ t φ ( t ) e i ν t d t + i ν Δ t 2 0 n Δ t φ ( t ) e i ν t d t + O ( Δ t 2 ) × i ν + ν 2 Δ t 2 + O ( Δ t 2 ) + ξ n e i ν n Δ t ] = lim n i ν 0 n Δ t φ ( t ) e i ν t d t + ξ n e i ν n Δ t + O ( Δ t 2 ) .
The first term on the right-hand side can be further simplified as
i ν 0 n Δ t φ ( t ) e i ν t d t = φ ( t ) e i ν t 0 n Δ t + 0 n Δ t φ ˙ ( t ) e i ν t d t = φ ( n Δ t ) e i ν n Δ t + 0 n Δ t ϕ ( t ) e i ν t d t .
Consequently,
ϕ ^ ( ν Δ t ) = lim n 0 n Δ t ϕ ( t ) e i ν t d t φ ( n Δ t ) e i ν n Δ t + ξ n e i ν n Δ t + O ( Δ t 2 ) = 0 ϕ ( t ) e i ν t d t + O ( Δ t 2 ) ,
where we have used the identity lim n ξ n = φ ( ) . □
Now, we complete a proof of Theorem 2. According to Equation (17), the spectral density of the univariate coarse-grained Hawkes process is expressed as
f Δ t ( c g ) ( ω ) = λ Δ t 2 π 1 | 1 ϕ ^ ( ω ) | 2 .
Substituting Equation (A3) into the expression above yields
f Δ t ( c g ) ( ν Δ t ) = λ Δ t 2 π 1 | 1 ϕ ˜ ( ν ) + O ( Δ t 2 ) | 2 = λ Δ t 2 π 1 | 1 ϕ ˜ ( ν ) | 2 + O ( Δ t 2 ) = f ( h w ) ( ν ) Δ t + O ( Δ t 3 ) ,
thereby completing the proof of Theorem 2 in the univariate setting.
For comparison, consider the binned Poisson approximation introduced in Equations (1) and (2), wherein the discretized excitation kernel is defined by ϕ k = ϕ ( k Δ t ) Δ t for k = 1 , 2 , . The discrete-time Fourier transform of this kernel is given by
ϕ ^ ( ν Δ t ) = k = 1 ϕ ( k Δ t ) e i ν k Δ t Δ t = ϕ ˜ ( ν ) + O ( Δ t ) ,
as Δ 0 . Observe that the approximation order is lower than that of Equation (A3). Consequently, the spectral density of the binned Poisson approximation is asymptotically given by
f Δ t ( p o ) ( ν Δ t ) = f ( h w ) ( ν ) Δ t + O ( Δ t 2 ) .

Appendix B. Second-Order Properties of the Stationary Hawkes Process

This appendix provides a concise summary of the second-order statistical characteristics of the stationary Hawkes process. We present only those results pertinent to the analysis in this paper, and refer the reader to [22,28,29] for detailed derivations.
  • Spectral Density Matrix of the Stationary Hawkes Process:
    F ( h w ) ( ν ) = 1 2 π ( I Φ ˜ ( ν ) ) 1 diag ( λ ) ( I Φ ˜ T ( ν ) ) 1 , ν ( , ) ,
    where Φ ˜ ( ν ) = 0 Φ ( t ) e i ν t d t denotes the Fourier transform of the excitation kernel matrix, and
    λ = E [ λ ( t ) ] = ( I A ) 1 μ ,
    represents the stationary (mean) intensity.
  • Expected Value of the Binned Stationary Hawkes Process:
    E [ X k ] = λ Δ t = ( I A ) 1 μ Δ t .
  • Spectral Density Matrix of the Binned Stationary Hawkes Process:
    F Δ t ( h w ) ( ω ) = Δ t k = sinc 2 ω + 2 π k 2 F ( h w ) ω + 2 π k Δ t , ω [ π , π ) .
To evaluate (A8) in the limit as Δ t 0 , we employ the following asymptotic expansion:
sinc ν Δ t + 2 π k 2 = 1 ν 2 Δ t 2 12 + O ( Δ t 4 ) , k = 0 , ν 2 Δ t 2 4 π 2 k 2 + O ( Δ t 3 ) , k 0 .
Substituting into (A8) yields the approximation:
F Δ t ( h w ) ( ν Δ t ) = 1 ν 2 Δ t 2 12 + O ( Δ t 4 ) F ( h w ) ( ν ) Δ t + k 0 ν 2 Δ t 2 4 π 2 k 2 + O ( Δ t 3 ) F ( h w ) ν + 2 π k Δ t Δ t = F ( h w ) ( ν ) Δ t + O ( Δ t 3 ) .

Appendix C. Power-Law Distribution

A power-law distribution for waiting times is defined as
g ( t ) = β γ ( 1 + β t ) 1 + γ , t 0 ,
for β > 0 and γ > 0 . It is well-known that the moments of a power-law distribution exist and are finite for all orders strictly less than the exponent γ . In particular, the expected waiting time is finite if γ > 1 , and is given by
0 t g ( t ) d t = 1 β ( γ 1 ) .
The corresponding coarse-grained kernel is given by
g k = 1 ( 1 + β Δ ) 1 γ 1 / β ( 1 γ ) Δ , k = 0 , { 1 + β ( k + 1 ) Δ } 1 γ 2 { 1 + β k Δ } 1 γ + { 1 + β ( k 1 ) Δ } 1 γ / β ( 1 γ ) Δ , k = 1 , 2 , ,
for γ 1 , and
g k = 1 log ( 1 + β Δ ) / β Δ , k = 0 , log { 1 + β ( k + 1 ) Δ } 2 log { 1 + β k Δ } + log { 1 + β ( k 1 ) Δ } / β Δ , k = 1 , 2 , ,
for γ = 1 .
Figure A1 illustrates the power spectral density (PSD), cross-spectral density (CSD), auto-covariance, and cross-covariance of the binned Hawkes process (blue dotted lines), the coarse-grained Hawkes process (red lines), and the binned Poisson approximation (green lines), respectively. Figure A2 presents the information loss associated with the coarse-grained Hawkes process (red lines) and the binned Poisson approximation (green lines), respectively. These results are qualitatively consistent with those obtained using the exponential kernel, as shown in Figure 1 and Figure 2.
Figure A1. Same as Figure 1, but for the power-law kernel. Parameters of the Hawkes process are set to μ = 1 , α ( s ) = 0.4 , α ( c ) = 0.3 , β = 1 , and γ = 2 . (a) Δ t = 0.1, (b) Δ t = 1, (c) Δ t = 2.
Figure A1. Same as Figure 1, but for the power-law kernel. Parameters of the Hawkes process are set to μ = 1 , α ( s ) = 0.4 , α ( c ) = 0.3 , β = 1 , and γ = 2 . (a) Δ t = 0.1, (b) Δ t = 1, (c) Δ t = 2.
Entropy 27 00555 g0a1
Figure A3, Figure A4, Figure A5 and Figure A6 summarize the results of parameter estimation in scenarios where the excitation kernels follow power-law distributions. The MC-EM algorithm was omitted, as the power-law kernel is not supported by the publicly available implementation [25]. As with the exponential kernel case shown in Figure 3, Figure 4, Figure 5 and Figure 6, the proposed method consistently outperforms both the binned Poisson MLE and the INAR(p) method, particularly with respect to reducing estimation bias.
Figure A2. Same as Figure 2, but for the power-law kernel. Parameters are set to μ = 1 , β = 1 , and γ = 2 .
Figure A2. Same as Figure 2, but for the power-law kernel. Parameters are set to μ = 1 , β = 1 , and γ = 2 .
Entropy 27 00555 g0a2
Figure A3. Boxplots of the estimated values for each of the ten model parameters. Same as Figure 3, but based on power-law kernels. The power-law exponent is set to γ = 2 , with all other parameters identical to those used in the exponential kernel scenario.
Figure A3. Boxplots of the estimated values for each of the ten model parameters. Same as Figure 3, but based on power-law kernels. The power-law exponent is set to γ = 2 , with all other parameters identical to those used in the exponential kernel scenario.
Entropy 27 00555 g0a3
Figure A4. Root mean squared error (RMSE) of the parameter estimates. Same as Figure 4, but evaluated using power-law kernels.
Figure A4. Root mean squared error (RMSE) of the parameter estimates. Same as Figure 4, but evaluated using power-law kernels.
Entropy 27 00555 g0a4
Figure A5. Bias in the estimated parameters. Same as Figure 5, but for the power-law kernel setting.
Figure A5. Bias in the estimated parameters. Same as Figure 5, but for the power-law kernel setting.
Entropy 27 00555 g0a5
Figure A6. Standard deviation (STD) of the parameter estimates. Same as Figure 6, but for the power-law kernel case.
Figure A6. Standard deviation (STD) of the parameter estimates. Same as Figure 6, but for the power-law kernel case.
Entropy 27 00555 g0a6

References

  1. Hawkes, A.G. Spectra of some self-exciting and mutually exciting point processes. Biometrika 1971, 58, 83–90. [Google Scholar] [CrossRef]
  2. Hawkes, A.G. Point spectra of some mutually exciting point processes. J. R. Stat. Soc. Ser. B (Methodol.) 1971, 33, 438–443. [Google Scholar] [CrossRef]
  3. Adamopoulos, L. Cluster models for earthquakes: Regional comparisons. J. Int. Assoc. Math. Geol. 1976, 8, 463–475. [Google Scholar] [CrossRef]
  4. Ogata, Y. Statistical models for earthquake occurrences and residual analysis for point processes. J. Am. Stat. Assoc. 1988, 83, 9–27. [Google Scholar] [CrossRef]
  5. Chornoboy, E.S.; Schramm, L.P.; Karr, A.F. Maximum likelihood identification of neural point process systems. Biol. Cybern. 1988, 59, 265–275. [Google Scholar] [CrossRef]
  6. Pernice, V.; Staude, B.; Cardanobile, S.; Rotter, S. How structure determines correlations in neuronal networks. PLoS Comput. Biol. 2011, 7, e1002059. [Google Scholar] [CrossRef]
  7. Reynaud-Bouret, P.; Schbath, S. Adaptive estimation for Hawkes processes; application to genome analysis. Ann. Stat. 2010, 38, 2781–2822. [Google Scholar] [CrossRef]
  8. Bacry, E.; Mastromatteo, I.; Muzy, J.F. Hawkes processes in finance. Mark. Microstruct. Liq. 2015, 1, 1550005. [Google Scholar] [CrossRef]
  9. Hawkes, A.G. Hawkes processes and their applications to finance: A review. Quant. Financ. 2018, 18, 193–198. [Google Scholar] [CrossRef]
  10. Fox, E.W.; Short, M.B.; Schoenberg, F.P.; Coronges, K.D.; Bertozzi, A.L. Modeling E-mail Networks and Inferring Leadership Using Self-Exciting Point Processes. J. Am. Stat. Assoc. 2016, 111, 564–584. [Google Scholar] [CrossRef]
  11. Kobayashi, R.; Lambiotte, R. TiDeH: Time-dependent Hawkes process for predicting retweet dynamics. In Proceedings of the International AAAI Conference on Web and Social Media, Cologne, Germany, 17–20 May 2016; Volume 10, pp. 191–200. [Google Scholar]
  12. Koyama, S.; Shinomoto, S. Statistical physics of discovering exogenous and endogenous factors in a chain of events. Phys. Rev. Res. 2020, 2, 043358. [Google Scholar] [CrossRef]
  13. Mohler, G.; Short, M.B.; Brantingham, P.J.; Schoenberg, F.P.; Tita, G.E. Self-Exciting Point Process Modeling of Crime. J. Am. Stat. Assoc. 2011, 106, 100–108. [Google Scholar] [CrossRef]
  14. Zhuang, J.; Mateu, J. A semiparametric spatiotemporal Hawkes-type point process model with periodic background for crime data. J. R. Stat. Soc. Ser. A (Stat. Soc.) 2019, 182, 919–942. [Google Scholar] [CrossRef]
  15. Lewis, E.; Mohler, G.; Brantingham, P.J.; Bertozzi, A.L. Self-exciting point process models of civilian deaths in Iraq. Secur. J. 2012, 25, 244–264. [Google Scholar] [CrossRef]
  16. Kalair, K.; Connaughton, C.; Loro, P.A.D. A non-parametric Hawkes process model of primary and secondary accidents on a UK smart motorway. J. R. Stat. Soc. Ser. C (Appl. Stat.) 2021, 70, 80–97. [Google Scholar] [CrossRef]
  17. Kirchner, M. Hawkes and INAR() processes. Stoch. Processes Their Appl. 2016, 126, 2494–2525. [Google Scholar] [CrossRef]
  18. Kirchner, M. An estimation procedure for the Hawkes process. Quant. Financ. 2017, 17, 571–595. [Google Scholar] [CrossRef]
  19. Shlomovich, L.; Cohen, E.A.K.; Adams, N.; Patel, L. Parameter estimation of binned Hawkes processes. J. Comput. Graph. Stat. 2022, 31, 990–1000. [Google Scholar] [CrossRef]
  20. Shlomovich, L.; Cohen, E.A.K.; Adams, N. A parameter estimation method for multivariate binned Hawkes processes. Stat. Comput. 2022, 32, 98. [Google Scholar] [CrossRef]
  21. Chen, F.; Kwan, T.K.J.; Stindl, T. Estimating the Hawkes Process From a Discretely Observed Sample Path. J. Comput. Graph. Stat. 2025, 1–13. [Google Scholar] [CrossRef]
  22. Cheysson, F.; Lang, G. Spectral estimation of Hawkes processes from count data. Ann. Stat. 2022, 50, 1722–1746. [Google Scholar] [CrossRef]
  23. Daley, D.; Vere-Jones, D. An Introduction to the Theory of Point Processes Volume II: General Theory and Structure, 2nd ed.; Springer: New York, NY, USA, 2008. [Google Scholar]
  24. Mark, B.; Raskutti, G.; Willett, R. Network estimation from point process data. IEEE Trans. Inf. Theory 2019, 65, 2953–2975. [Google Scholar] [CrossRef]
  25. Shlomovich, L. MATLAB Code for Multivariate Implementation of Aggregated Hawkes Parameter Estimation. Available online: https://github.com/lshlomovich/MCEM_Multivariate_Hawkes (accessed on 11 February 2025).
  26. Durbin, J.; Koopman, S. Time Series Analysis by State Space Methods; Oxford University Press: Oxford, UK, 2001. [Google Scholar]
  27. Kitagawa, G. Introduction to Time Series Modeling; Chapman and Hall/CRC: Boca Raton, FL, USA, 2010. [Google Scholar]
  28. Bacry, E.; Dayri, K.; Muzy, J.F. Non-parametric kernel estimation for symmetric Hawkes processes. Application to high frequency financial data. Eur. Phys. J. B 2012, 85, 157. [Google Scholar] [CrossRef]
  29. Bacry, E.; Muzy, J.F. First- and second-order statistics characterization of Hawkes processes and non-parametric estimation. IEEE Trans. Inf. Theory 2016, 62, 2184–2202. [Google Scholar] [CrossRef]
Figure 1. Power spectral density (PSD), cross-spectral density (CSD), auto-covariance, and cross-covariance functions for the three processes at bin sizes Δ t = 0.1 (a), 1 (b), and 2 (c). The blue dotted line corresponds to the binned Hawkes process, while the red and green lines depict the coarse-grained Hawkes process and the binned Poisson approximation, respectively. The parameters of the Hawkes process are set as μ = 1 , α ( s ) = 0.4 , α ( c ) = 0.3 , and β = 1 . The coarse-grained Hawkes process provides a close approximation to the binned Hawkes process, whereas the binned Poisson approximation exhibits noticeable degradation for Δ t = 1 and 2.
Figure 1. Power spectral density (PSD), cross-spectral density (CSD), auto-covariance, and cross-covariance functions for the three processes at bin sizes Δ t = 0.1 (a), 1 (b), and 2 (c). The blue dotted line corresponds to the binned Hawkes process, while the red and green lines depict the coarse-grained Hawkes process and the binned Poisson approximation, respectively. The parameters of the Hawkes process are set as μ = 1 , α ( s ) = 0.4 , α ( c ) = 0.3 , and β = 1 . The coarse-grained Hawkes process provides a close approximation to the binned Hawkes process, whereas the binned Poisson approximation exhibits noticeable degradation for Δ t = 1 and 2.
Entropy 27 00555 g001
Figure 2. Information loss as a function of Δ t for the coarse-grained Hawkes process (red line) and the binned Poisson approximation (green line). The parameters are set to μ = 1 and β = 1 . The information loss associated with the binned Poisson approximation increases with larger values of Δ t , α ( s ) , and α ( c ) , whereas the information loss incurred by the coarse-grained Hawkes process remains negligible across all configurations.
Figure 2. Information loss as a function of Δ t for the coarse-grained Hawkes process (red line) and the binned Poisson approximation (green line). The parameters are set to μ = 1 and β = 1 . The information loss associated with the binned Poisson approximation increases with larger values of Δ t , α ( s ) , and α ( c ) , whereas the information loss incurred by the coarse-grained Hawkes process remains negligible across all configurations.
Entropy 27 00555 g002
Figure 3. Boxplots of the estimated values for each of the ten model parameters. The green solid line indicates the ground truth values. Note that the INAR(p) method may yield negative values, which are omitted when log-scaled axes are used.
Figure 3. Boxplots of the estimated values for each of the ten model parameters. The green solid line indicates the ground truth values. Note that the INAR(p) method may yield negative values, which are omitted when log-scaled axes are used.
Entropy 27 00555 g003
Figure 4. Root mean squared error (RMSE) of the parameter estimates. Overall, the RMSE of the proposed method is comparable to that of the MC-EM algorithm, and significantly lower than that of the binned Poisson approximation and the INAR(p) method.
Figure 4. Root mean squared error (RMSE) of the parameter estimates. Overall, the RMSE of the proposed method is comparable to that of the MC-EM algorithm, and significantly lower than that of the binned Poisson approximation and the INAR(p) method.
Entropy 27 00555 g004
Figure 5. Bias in the estimated parameters. The proposed method consistently achieves the lowest bias among the four methods, except for a few isolated cases.
Figure 5. Bias in the estimated parameters. The proposed method consistently achieves the lowest bias among the four methods, except for a few isolated cases.
Entropy 27 00555 g005
Figure 6. Standard deviation (STD) in the parameter estimates. The MC-EM algorithm generally attains the lowest standard deviation among the four methods.
Figure 6. Standard deviation (STD) in the parameter estimates. The MC-EM algorithm generally attains the lowest standard deviation among the four methods.
Entropy 27 00555 g006
Figure 7. PDFs and their coarse-grained counterparts for Δ t = 0.5 , 1, and 2. The mean of all PDFs is 1, and the standard deviation (STD) is (a) 0.5 , (b) 1, and (c) 1.5 . Note that the gamma and Weibull distributions converge to the exponential distribution when STD = 1 . Additionally, the power-law distribution is not defined for STD 1 .
Figure 7. PDFs and their coarse-grained counterparts for Δ t = 0.5 , 1, and 2. The mean of all PDFs is 1, and the standard deviation (STD) is (a) 0.5 , (b) 1, and (c) 1.5 . Note that the gamma and Weibull distributions converge to the exponential distribution when STD = 1 . Additionally, the power-law distribution is not defined for STD 1 .
Entropy 27 00555 g007
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Koyama, S. Coarse-Grained Hawkes Processes. Entropy 2025, 27, 555. https://doi.org/10.3390/e27060555

AMA Style

Koyama S. Coarse-Grained Hawkes Processes. Entropy. 2025; 27(6):555. https://doi.org/10.3390/e27060555

Chicago/Turabian Style

Koyama, Shinsuke. 2025. "Coarse-Grained Hawkes Processes" Entropy 27, no. 6: 555. https://doi.org/10.3390/e27060555

APA Style

Koyama, S. (2025). Coarse-Grained Hawkes Processes. Entropy, 27(6), 555. https://doi.org/10.3390/e27060555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop