You are currently viewing a new version of our website. To view the old version click .
J
  • Article
  • Open Access

14 October 2025

Modeling the Mutual Dynamic Correlations of Words in Written Texts Using Multivariate Hawkes Processes

,
,
and
Faculty of Arts and Sciences at FUJIYOSHIDA, Showa Medical University, 4562 Kamiyoshida, Fujiyoshida-shi 403-0005, Japan
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Feature Papers of J—Multidisciplinary Scientific Journal in 2025

Abstract

The occurrence patterns of important words found in six texts (one historical pamphlet and five renowned academic books) are analyzed using both univariate and multivariate Hawkes processes. By treating the occurrence patterns as binary time-series data along the texts, we investigate how effectively univariate and multivariate Hawkes processes capture the characteristics of these word occurrence signals. Through maximum likelihood estimation and subsequent simulations, we found that the multivariate Hawkes process clearly outperforms the univariate Hawkes process in modeling word occurrence signals. Moreover, we found that the multivariate Hawkes process can provide a Hawkes graph, which serves as an intuitive representation of the relationships between concepts appearing in the analyzed text. Furthermore, our study demonstrates that the importance of concepts within a given text can be quantitatively estimated based on the optimized parameter values of the multivariate Hawkes process.

1. Introduction

Analyzing document data as a time series has been a commonly employed approach in the past [1,2,3,4,5,6]. One notable advantage of representing documents as time-series data is the ability to quantitatively capture long-range correlations among various components of the documents [7,8,9,10,11,12]. This advantage stems from the application of mathematical tools in time-series analysis, such as autocorrelation functions [13,14,15], waiting time distributions [16,17], and similar methods. For instance, autocorrelation functions can be utilized to determine the significance of words within a given document [13]. Words exhibiting strong long-range autocorrelation across a text are often considered to be closely related to the document’s central theme [13,14,15,16,17].
In a previous study [17], we utilized univariate Hawkes processes to model word occurrence patterns in texts, demonstrating their effectiveness in capturing the dynamic, long-range correlations between different positions in a considered text. Hawkes processes are a type of stochastic process designed to model events occurring over time, influenced by prior occurrences [18]. They are particularly effective at capturing self-exciting phenomena, where the occurrence of one event increases the probability of subsequent events, whether in the short-term or long-term [19,20,21,22,23,24,25,26,27,28]. This characteristic makes Hawkes processes highly suitable for describing the occurrences of significant words in a text that are associated with specific concepts or ideas. Such key words often reappear during the explanation of crucial concepts. In other words, important words related to the subject matter of the text exhibit self-excitatory behavior, correlating with their own past occurrence signals. As a result, their patterns can be effectively modeled using Hawkes processes. This was confirmed in our previous study [17], which showed that many keywords in famous academic books follow such patterns.
However, the methodology proposed in the previous study [17] has certain limitations. Specifically, it employed univariate Hawkes processes, which can capture autocorrelation—measuring how a word’s occurrence signal is correlated with itself over different time lags. However, this approach cannot account for cross-correlation, which quantifies the correlation between the occurrence signals of two different words as a function of a time lag. It is common for significant concepts in documents to be explained from multiple perspectives, using several key terms. In such cases, these key terms should be interrelated or correlated, such that they mutually influence one another’s appearance within the document.
To characterize such documents, the model used should be capable of capturing the mutual correlations among keywords. This requirement underscores the necessity of multivariate Hawkes processes [29,30,31] for text modeling. As will be elaborated later, multivariate Hawkes processes can effectively represent multiple, mutually correlated stochastic processes, such as the occurrence signals of several keywords within a text. The objective of this study is to employ multivariate Hawkes processes to model documents. To the best of our knowledge, only a few studies have explored the application of multivariate Hawkes processes to document analysis [32,33]. However, their primary focus lies in modeling relationships between documents—such as interactions among social media posts or between news articles—rather than analyzing the internal structure of a single document, which is the central theme of the present study. We will demonstrate in this study that employing multivariate Hawkes processes, as opposed to univariate ones, significantly enhances the accuracy in describing word occurrence signals. Moreover, we discovered that Hawkes graphs [34], which are graphical representations of multivariate Hawkes processes, are highly effective in intuitively conveying the contents of analyzed documents. These graphs sensitively depict the interrelationships among multiple keywords appearing in the document. From the graph, it is possible to identify a central keyword that is closely tied to the document’s theme, as well as several other keywords used to describe or introduce the central concept. While Hawkes graphs may appear similar to diagrams depicting word co-occurrence networks [35,36], they are more advantageous as they incorporate information about causal relationships and dependencies among keywords. Thus, another significant benefit of utilizing the multivariate Hawkes process lies in its ability to provide an intuitive representation of document content through Hawkes graphs.
The remainder of this paper is organized as follows. In the next section, we outline the methodology, describing the characteristics of the univariate and multivariate Hawkes processes employed, including their kernel functions and log-likelihood functions. Additionally, we explain how word occurrence signals were extracted from each document and how these signals were modeled using the univariate and multivariate Hawkes processes. This section also details the simulation procedures used to validate our modeling approach with both types of Hawkes processes. The subsequent section presents our results, highlighting the advantages of multivariate Hawkes processes over univariate ones. This section also elaborates on the construction and practical aspects of Hawkes graphs. Finally, in the concluding section, we summarize our findings and propose directions for future research.

2. Methodology

2.1. Converting Text as Time-Series Data

To treat written texts as time-series data, we assign a serial number to every sentence in a considered document and assign a time role to this sentence number [13,14,15,16,17]. The occurrence signal of a considered word is then defined as
x t = 0         w h e n   t h e   w o r d   d o e s   n o t   a p p e a r   i n   t h e   t - t h   s e n t e n c e 1                                       w h e n   t h e   w o r d   a p p e a r s   i n   t h e   t - t h   s e n t e n c e ,
which is a binary variable expressing word occurrence event, and accumulated value of x ( t ) , i.e.,
N s = t = 1 s x t
becomes a counting process for the word occurrence event, i.e., N ( s ) represents the number of occurrences of a considered word along the documents. We treat x ( t ) and N ( s ) as time-series data and seek a stochastic process that can well describe the behavior of x ( t ) and N ( s ) . As already clarified, for words that are not directly connected to the theme of a given document and therefore that do not exhibit any dynamic correlations, their occurrences can be accurately modeled using either a homogeneous Poisson process or an inhomogeneous Poisson process [13,14,15,16]. In this paper, we investigate stochastic processes that characterize x ( t ) and N ( s ) for significant words which are closely related to the document’s theme and therefore have long-range dynamic correlations.

2.2. Maximum Likelihood Estimation of Hawkes Processes

To illustrate our methodology, we briefly introduce the intensity and likelihood functions of univariate and multivariate Hawkes processes, which we consider to be suitable stochastic models for explaining the observed word occurrence signals, Equation (1), for real words in the analyzed text.
The univariate Hawkes process is mathematically defined by its intensity function, λ ( t ) , representing the conditional event rate (word occurrence rate) at time t . For a process with an exponential decay kernel, the intensity function is given as [37,38]:
λ t = μ + t i <   t α exp β t t i
where μ denotes the baseline intensity, t i represents the occurrence times of previous events ( t i   <   t ), α is a positive parameter quantifying the magnitude of self-enhancement, and β is a positive parameter characterizing the rate at which the effect of self-enhancement decays. Given that event times { t 1 ,   t 2 ,   t 3 , ,   t n } are observed within a time interval [ 0 , T ] , where T denotes the total number of sentences in the text, the log-likelihood function of the univariate Hawkes process is expressed as [37,38]:
l μ ,   α ,   β = i = 1 n log μ + t j <   t i α exp β t i t j μ T α β i = 1 n 1 exp β T t i .
In the framework of maximum likelihood estimation (MLE), μ , α and β are treated as fitting parameters to adapt the univariate Hawkes process to the observed sequence of event times.
The multivariate Hawkes process expands upon the univariate Hawkes process by introducing multiple dimensions, allowing it to capture interactions among various types of events. In our case, we anticipate that the multivariate Hawkes process can effectively model the cross-correlation between word occurrence signals of different words within a given text. If we select d words as the focus of our analysis and consider the i -th word, the intensity function of the multivariate Hawkes process for events of type i (i.e., the intensity function for the occurrence of the i -th word) is defined as [38,39]:
λ i t = μ i + j = 1 d k : t k j <   t α i j exp β i j t t k j
where μ i   is the baseline intensity for events of type i , d is the total number of event types (the total number of analyzed words), t k j represents the time of the k -th event of type j (i.e., the k -th occurrence of the j -th word), α i j denotes the influence strength of events of type j on those of type i, and β i j denotes the decay rate of excitation induced by the occurrence of the j -th word on the subsequent occurrence of the i -th word.
The log-likelihood function for the multivariate Hawkes process is expressed as:
l μ ,   α ,   β = i = 1 d k = 1 n i log μ i + j = 1 d m : t m j <   t k i α i j exp β i j t k i t m j   i = 1 d μ i T j = 1 d m = 1 n j α i j β i j 1 exp β i j T t m j
Here, n i represents the number of occurrences of events of type i , μ is the baseline intensity vector, defined as   μ = ( μ 1 ,   μ 2 , ,   μ d ) , while α = { α i j } and β = { β i j } are matrices comprising the fitting parameters introduced in Equation (5), T is the total number of sentences in the text. Equation (6) can be derived in the following manner. According to the general theory of point processes, the likelihood function of the multivariate Hawkes process is given by [40] (Chapter 7):
L ( μ ,   α ,   β ) = i = 1 d k = 1 n i λ i t k i exp 0 T λ i t d t
where λ i t is defined by Equation (5). By substituting Equation (5) into Equation (7), taking the logarithm of both sides, and performing further integral calculations, we obtain the log-likelihood function given in Equation (6).
In the framework of MLE for univariate Hawkes processes, the goal is to find optimized parameters μ , α and β that maximize the log-likelihood function, Equation (4), given a list of event times { t 1 ,   t 2 ,   t 3 , ,   t n } . Similarly, for multivariate Hawkes processes, the objective is to optimize the vector μ and matrices α and β to maximize the log-likelihood function, Equation (6), given lists of event times for each type of event, { t 1 i ,   t 2 i , ,   t n i i } for 1 i d . In practical MLE procedures, we used the minimize() function from the Python (version 3.11.5) library scipy.optimize (version 1.11.1) to determine the optimal parameter values by minimizing l ( μ , α , β ) and l ( μ , α , β ) . The L-BFGS-B (Limited-memory Broyden–Fletcher–Goldfarb–Shanno with Box constraints) method was employed to define upper and lower bounds for the fitting parameters, ensuring that the solutions remained within the specified ranges.

2.3. Selecting Important Words in Used Texts

Six texts employed in this study are listed in Table 1. One of the six documents is a famous historical pamphlet (“Common Sense” by Thomas Paine); the other five are well-known academic books. They are chosen so as to represent wide range of written texts, In the table, short names of each book and some information are also shown. The preface, contents and index pages were deleted before starting the text preprocessing because they may act as noise and may affect the final results.
Table 1. Summary of English texts employed.
We select 20 important words from each of the six texts to analyze their occurrence signals by using the univariate and the multivariate Hawkes processes. This means that 20 pairs of Equations (3) and (4), are considered for modeling the 20 words with univariate Hawkes processes while we set d = 20 in Equations (5) and (6) for modeling with the multivariate Hawkes process. The selected 20 important words having long-range dynamic correlations are listed in Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7. Note that only nouns are included in the tables, as this study aims to investigate the causal relationships between “concepts” represented by nouns within the document (e.g., word “A” is used to describe word “B,” etc.). This approach is feasible because we primarily analyze academic books, where a word that frequently occurs with strong dynamic correlations is assumed to carry a consistent and well-defined meaning. In this context, a word exhibiting strong and long-range dynamic correlations is regarded as representing a coherent, singular “concept.” The investigation of relationships between “concepts” is based on correlations extracted through multivariate Hawkes process analysis.
Table 2. The top 20 important words for Darwin text.
Table 3. The top 20 important words for Einstein text.
Table 4. The top 20 important words for Faraday text.
Table 5. The top 20 important words for Freud text.
Table 6. The top 8 important words for Paine text.
Table 7. The top 20 important words for Plato text.
The accuracy of representing word occurrence signals using the multivariate Hawkes process improves as the number of analyzed words increases. This is because a larger set of words allows for the consideration of all possible event causes that mutually excite one another. In most documents, key concepts are explained from multiple perspectives, utilizing numerous words to convey complex ideas. Such texts often exhibit correlations where many words influence each other.
In a multivariate Hawkes process, each dimension represents the occurrence signal of a single word. Consequently, limiting the process to only 20 dimensions restricts its ability to capture correlations among words, making it insufficient for describing the cross-correlations commonly found in texts. However, the maximum likelihood estimation (MLE) of a multivariate Hawkes process requires extensive convergence calculations, making it computationally intensive and practical only for cases with fewer than 20 dimensions. Therefore, establishing clear selection criteria for the 20 key words, as outlined below, is critical to this study.
First, we calculated the autocorrelation functions (ACFs) for all words that appear more than 50 times in each text, using the formula:
ϕ s = 1 T s t = 1 T s x t x ¯ x t + s x ¯ 1 T t = 1 T x t x ¯ 2
where x ¯ represents a mean value of x ( t ) . Then, these ACFs were fitted by a Kohlrausch–Williams–Watts (KWW) function:
ϕ t = exp t τ β
where τ (a relaxation time) and β (a shape parameter) are fitting parameters that satisfy τ > 0 and 0 < β 1 . The KWW function has been demonstrated to effectively represent real ACFs of important words with long-range dynamic correlations [13,14,15,16,17]. Next, we evaluated the mean relaxation times τ for each word using the equation:
τ = 0 exp t τ β d t = τ β   Γ 1 β
where Γ ( x )  denotes the gamma function. Finally, we select top 20 words in terms of τ  as the key words to be analyzed for each document. A large mean relaxation time, τ , indicates that a word exhibits both strong dynamic correlations and long-range memory, making it relevant to the key concept or theme of the text. Therefore, we consider words with large τ  to be significant. In Table 2, Table 3, Table 4, Table 5, Table 6 and Table 7, the values of the fitting parameters, τ  and β , and the mean relaxation time, τ , are shown for each of the 20 words. Note that for the Paine text seen in Table 6, the selected words are limited to only 8, as the text is significantly shorter than the others. We relaxed the criteria to 45 or more occurrences instead of 50 or more, but only eight words met that criterion for the Paine text. Figure 1 shows the word occurrence signal x ( t ) , the calculated ACF and the fitted KWW function for a picked word “formation” in the Darwin text. Optimized values of fitting parameters used in Figure 1b are τ = 0.4455 ,     β = 0.1427  and the resultant value of objective function (sum of squared residuals) is 0.24732. In the fitting of the KWW function to ACFs, we employed a nonlinear least-squares algorithm to optimize the fitting parameters τ  and β . For practical calculations, we used the curve_fit( ) function from the scipy.optimize library. This function iteratively minimizes the objective function, defined as the sum of squared residuals—namely, the difference between the ACF and the KWW model—by adjusting the fitting parameters. The iterative process terminates when the change in the objective function between two successive steps falls below a predefined threshold of 1.0 × 10 8 . Upon completion of the iteration, we confirmed that the values of the objective functions (sum of squared residuals) are less than 0.3 for all important words.
Figure 1. (a) The word occurrence signal, x ( t ) , for the word “formation” in the Darwin text, and (b) the calculated ACFs (circles) along with the fitted KWW function to the ACFs (red curve) for the word “formation” in the Darwin text.
Of course, selecting words solely based on τ —as evaluated from the autocorrelation function—has its limitations, as it overlooks semantic significance. Nevertheless, we do not apply semantic analysis here, since semantically important words with weak dynamic correlations are not well-suited for analysis via stochastic process modeling used in this study. The relationship between dynamic correlations and semantic significance remains a compelling topic, deserving further exploration in future research.

2.4. Validation of Modeling with Hawkes Processes

In this study, we use univariate and multivariate Hawkes processes to model word occurrence signals in the considered texts. The validity of the modeling is confirmed by following two different procedures.
The first procedure is the quantile–quantile (Q–Q) plot. The Q–Q plot is a widely used diagnostic tool for evaluating the goodness-of-fit of Hawkes process models [41]. Examples are shown in Figure 2 and Figure 3, where transformed inter-event times—obtained via intensity normalization—are plotted against theoretical quantiles. Plots (a) illustrate Q–Q plots for univariate Hawkes processes, while plots (b) correspond to multivariate versions. The univariate models (plots (a)) exhibit pronounced deviations from the reference line, particularly in regions of short inter-event times, indicating inadequate fit in those intervals. In contrast, the multivariate models (plots (b)) display notable improvement, with transformed times aligning more closely with the theoretical distribution. This improvement is consistently observed across the majority of important words identified within each text corpus. A more detailed quantitative analysis of inter-event times is currently underway and will be presented in a separate publication.
Figure 2. The Q-Q plot of the word “formation” picked from Darwin text. (a) The Q-Q plot for univariate Hawkes process and (b) The Q-Q plot for multivariate Hawkes process.
Figure 3. The Q-Q plot of the word “selection” picked from Darwin text. (a) The Q-Q plot for univariate Hawkes process and (b) The Q-Q plot for multivariate Hawkes process.
Another approach for validating the applicability of both univariate and multivariate Hawkes processes involves simulating these models and comparing the simulated word occurrence signals with those observed in actual text data.
In this second approach, we first estimated the optimal values of the parameters μ , α  and β  in Equations (3) and (4) using maximum likelihood estimation (MLE) in order to evaluate the effectiveness of the univariate Hawkes process. According to the principle of maximum likelihood, the fitted univariate Hawkes process is expected to most accurately replicate the occurrence signal of the target word. Using the estimated parameters, we generated simulated word occurrence signals for the top 20 important words by running separate simulations for each corresponding univariate Hawkes process. These simulated signals were then compared with the real-word signals, with a particular focus on autocorrelation functions (ACFs). If the ACF of a simulated signal closely resembles that of the real signal, we deem the univariate Hawkes process effective for modeling that word’s occurrence pattern.
A similar procedure was followed to validate the multivariate Hawkes process. First, we estimated the parameters of a 20-dimensional multivariate Hawkes process using MLE, based on the word occurrence signals of the top 20 important words. Next, we simulated this multivariate model to generate 20 corresponding virtual signals. Finally, we compared the ACFs derived from the simulated and actual signals for each word to assess the adequacy of the multivariate Hawkes process in capturing the dynamics of word occurrences.
As mentioned before, it is evident that using only 20 words is insufficient to simulate a comprehensive representation of the central topics in a large document. Although this limitation may be addressed in future work as will be discussed in the last section, it is accepted in the present study due to the high computational cost associated with the MLE procedures.

3. Results and Discussion

3.1. Validity Confirmation of Hawkes Processes

We describe here the results of the second approach for validating our modeling. As outlined above, we primarily compare the characteristic quantities derived from simulated signals with those obtained from word occurrence signals observed in real texts. The comparisons focused on the following four quantities:
  • The total number of occurrences of the word throughout the text.
  • The relaxation time τ of the ACF, derived from the fitting parameter of the KWW function (Equation (9)).
  • The shape parameter β of the ACF, also obtained from the fitting parameter of the KWW function.
  • The Bayesian Information Criterion (BIC), calculated during the fitting of the KWW function to ACFs.
The results of these comparisons are displayed in plots, where the vertical axis represents one of the four characteristic quantities calculated from simulated signals, while the horizontal axis represents the corresponding quantity calculated from observed occurrence signals in the real text. Namely, plots (a)–(d) in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 present comparisons between four quantities derived from real word occurrence signals (horizontal axes) and those obtained from simulated univariate Hawkes processes (vertical axes). If linear relationships are observed in these plots, it indicates that the univariate Hawkes processes, as evaluated using Equation (4), are sufficiently valid for representing real word occurrence signals. Similarly, plots (e)–(h) in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 display analogous comparisons using simulated multivariate Hawkes processes. Note that we used 20-dimensional Hawkes processes in plots (e)–(h) in Figure 4, Figure 5, Figure 6, Figure 7 and Figure 9, while 8-dimensional Hawkes process was employed in plots (e)–(h) in Figure 8. Then, if linear relationships are well established in these plots, it confirms that the multivariate Hawkes processes evaluated via Equation (6) offer adequate modeling of word occurrences in real texts. Moreover, if the linear relationships are more prominent in plots (e)–(h) than in plots (a)–(d), it can be concluded that modeling with multivariate Hawkes processes provides a more accurate representation than the univariate counterparts.
Figure 4. Characteristics comparing real and simulated word occurrence signals for Darwin text. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 20-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x . Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Figure 5. Plots conveying the same interpretation as Figure 4 are presented for the Einstein text. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 20-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x . Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Figure 6. Plots conveying the same interpretation as Figure 4 are presented for the Faraday text. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 20-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x. Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter y = x , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Figure 7. Plots conveying the same interpretation as Figure 4 are presented for the Freud text. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 20-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x . Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Figure 8. Plots conveying the same interpretation as Figure 4 are presented for the Paine text, except that an 8-dimensional multivariate Hawkes process was used. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 8-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x . Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Figure 9. Plots conveying the same interpretation as Figure 4 are presented for the Plato text. In all plots, the horizontal axis represents characteristics of the real signals, while the vertical axis represents characteristics derived from the simulated signals. Characteristics derived from simulated signals generated from univariate Hawkes processes are shown in plots (ad), while those from the 20-dimensional multivariate Hawkes process are shown in plots (eh). The red lines represent the function y = x . Comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
Word occurrence signals were simulated 20 times for both univariate and multivariate Hawkes processes, resulting in 20 vertical values for each characteristic quantity corresponding to each word. The upper end of the error bars represents the maximum value among these 20 features, while the lower end denotes the minimum value. The vertical positions of the blue circles in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 indicate the averages of these 20 values.
If the simulated signal for a given word closely matches the real signal, the four characteristic quantities across plots (a)–(h) should align near or directly along the y = x  which is represented in red lines in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. Thus, we can conclude that the modeling using univariate or multivariate Hawkes processes is effective if the blue circles are positioned along the straight line indicating direct proportionality. To assess the degree of linear correspondence between the vertical and horizontal quantities, we calculated correlation coefficients, which are displayed in the titles of the plots in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9. Consequently, the validation criterion is how closely the correlation coefficient approaches 1.
Our primary focus is to evaluate the extent to which the effectiveness of modeling improves when transitioning from the univariate Hawkes process to the multivariate Hawkes process. To assess this, we compared the description accuracies of the univariate Hawkes process and the multivariate Hawkes process across Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9: comparisons between plots (a) and (e) for the number of occurrences, between plots (b) and (f) for the relaxation time τ , between plots (c) and (g) for the shape parameter β , and between plots (d) and (h) for the Bayesian Information Criterion (BIC).
These comparisons clearly demonstrate that the multivariate Hawkes process significantly enhances modeling effectiveness compared to the univariate Hawkes process. This improvement is evidenced by the scatter plots of the four characteristic quantities in Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9 being distributed closer to the red straight lines y = x in the case of the multivariate Hawkes process, with correlation coefficients approaching 1. Table 8 shows such improvement. In the table, each coefficient indicates the degree of correlation between the real and simulated word occurrence signals for the given quantity. The simulated signals are generated using either univariate or multivariate Hawkes processes, and both cases are presented in the table for comparison. Indeed, Table 8 confirms that the correlation coefficient for each quantity is closer to 1 when modeling with the multivariate Hawkes process is applied.
Table 8. Comparison of the correlation coefficients for the four characteristic quantities.
Table 9 provides supplementary information on Table 8. That is, Table 9 summarizes the p-values assessing statistical significance in the uncorrelated (independence) test. Cells corresponding to significance levels below 10%, 5%, and 1% are shaded with progressively darker tones to indicate increasing levels of significance. As shown by the increased shading in the lower half of the table, the multivariate Hawkes process yields stronger correlations between the simulated quantities and the observed signal values. This trend suggests that the multivariate Hawkes process more accurately captures and reproduces the real signals of word occurrences.
Table 9. p-value of the test for lack of correlation (independence test).
The results described above are unsurprising, given the nature of the document, which presents concepts each of which is explained by using multiple key terms. These key terms are frequently repeated throughout the explanation, creating inherent correlations among them. As a result, the multivariate Hawkes process effectively captures these relationships. In contrast, the univariate Hawkes process is limited to capturing self-correlation, meaning it only reflects the correlations found within the occurrence signal of a single word.

3.2. Hawkes Graphs

Hawkes graphs are directed graphs that intuitively represent multivariate Hawkes processes. In these graphs, each node corresponds to a specific type of event in the multivariate Hawkes process, while each edge indicates how one type of event (represented at the root of the arrow) influences or enhances another type of event (located at the tip of the arrow). Examples of Hawkes graphs are presented in Figure 10 and Figure 11, which illustrate the multivariate Hawkes processes optimized to model word occurrence signals in the Paine text and the Plato text, respectively. Each node represents the occurrence events of an important word, and its corresponding background intensity, μ i , as defined in Equation (5), is also displayed within the node.
Figure 10. An example of a Hawkes graph representing the Paine text. Values of a i j are shown near the corresponding arrows, and values of μ i are indicated within their respective nodes. Node and arrow colors are shaded according to the magnitudes of a i j and μ i .
Figure 11. Example of a Hawkes graph representing the Plato text. Values of a i j are shown near the corresponding arrows, and values of μ i are indicated within their respective nodes. Node and arrow colors are shaded according to the magnitudes of a i j and μ i .
In a multivariate Hawkes process, various types of events mutually excite one another, and the magnitude of their mutual excitation is quantified by the following coefficient [32]:
a i j = α j i β j i = 0 α j i exp β j i t d t
Note that α j i quantifies the intensity by which the occurrence of the i -th word enhances the likelihood of the j -th word occurring, β j i represents a decay rate of this enhancement effect. Thus, the integral of Equation (11) expresses how strong the occurrence of i -th word induce the occurrence of j -th word in which the duration of enhancement is taken into account. The order of the subscripts i and j is deliberately reversed in a i j (on the left hand side) to make it intuitively clear that they are quantities in the i j direction. This coefficient measures how much the event of type- i enhances the occurrence of the type- j event. In Figure 10 and Figure 11, the values of a i j defined by Equation (11) are displayed alongside the edges, while the numbers within the nodes represent the background intensities μ i . The colors of the nodes and arrows are shaded such that higher values of μ i and a i j result in darker nodes and arrows. To improve readability, edges with a i j < 0.01 are not shown in the graphs.
By constructing a Hawkes graph for a given text using the optimized parameter values of the multivariate Hawkes process, it becomes easier to infer the central concepts or notions of the text. For instance, based on Figure 10, we identify “time” as one of the central notions in the Paine text. This conclusion is drawn from observing that five edges associated with the “time” node are incoming arrows, while only one edge flows outward from this node. The predominance of incoming arrows indicates that the concepts represented by the source nodes are predominantly used to describe the notion of “time.”

3.3. Finding Important Notions in Texts

The idea for inferring the central notions of a text from the Hawkes graph, as mentioned above, can be refined as follows. The difference between the inflows into the i -th node and the outflows from the same node is defined as:
Δ a i = j = 1 d a j i j = 1 d a i j .
This quantity intuitively represents the net extent to which the notion corresponding to the i -th node is explained by other important notions.
The values of Δ a i are displayed in Table 10, which ranks the importance of each notion in descending order for each text. The most important notion for each text—repress ented by the top-ranked word in Table 10—can be considered appropriately identified based on the content of each text, as will be discussed below.
Table 10. The difference between inflows and outflows, Δai, for each important notion.
Clearly, the notion of “species” in the Darwin text, “relativity” in the Einstein text, and “dream” in the Freud text are all closely tied to the main themes of the respective documents, serving as the central concepts each text seeks to explain. In the Faraday text, the principles of chemistry and physics are explored through the fascinating subject of candles. The most significant topic in the Faraday text is the investigation of the substances that constitute candles and how they transform during combustion. Thus, it is reasonable to identify “substance” as the most critical concept in the text. The Paine text is a groundbreaking pamphlet published during the American Revolution, emphasizing the urgency of declaring independence. In this text, the author employs the notion of “time” as a motivational framework for driving transformative action. Hence, “time” is regarded as central because the author explains the nature of the current “time” within this context. In the Plato text, the notion of “man” takes center stage because the author’s philosophical reflections on justice, governance, and enlightenment are deeply rooted in his understanding of human nature and elaborated upon through this perspective.
To assess the validity of using Δ a i as a measure of word importance in texts, we compared Δ a i against other widely used metrics for word significance. Specifically, we evaluated the TF-IDF scores and degree centralities [42] for each of the key words used in the multivariate Hawkes process modeling of the considered texts, and investigated how well the values of Δ a i aligned with these metrics. The calculated values of TF-IDF and degree centrality are listed in Table 10 and resulting correlation coefficients between Δ a i and TF-IDF, as well as between Δ a i and degree centrality, are presented in Table 11 for each book.
Table 11. Correlation coefficients between Δ a i and TF-IDF and those between Δ a i and Degree Centrality for each book.
Notably, the correlation between Δ a i and degree centrality was substantially higher than that between Δ a i and TF-IDF, as confirmed by a paired t-test ( p = 0.04433 ). This outcome is reasonable given the nature of the metrics: TF-IDF is derived purely from the frequency statistics of individual words across documents, whereas degree centrality incorporates information about word co-occurrence. Thus, both degree centrality and Δ a i reflect inter-word relationships, which likely explains their stronger correlation.
A key distinction between Δ a i and degree centrality lies in their graph structures: Δ a i is computed based on directed graphs—where edges, as shown in Figure 10 and Figure 11, are represented by arrows—while degree centrality, when used to measure word importance, is typically calculated using undirected graphs.
Although a comprehensive strategy for evaluating Δ a i via comparison with other metrics is not currently available, the observed strong correlation with degree centrality supports the reliability of Δ a i as an indicator of word importance.

3.4. Advantages of Analyzing Texts with Multivariate Hawkes Processes

Based on the results outlined so far, the following advantages of utilizing the multivariate Hawkes process have been established:
  • As demonstrated in Figure 2 and Figure 3, and Figure 4, Figure 5, Figure 6, Figure 7, Figure 8 and Figure 9, the accuracy of modeling word occurrence signals in documents is significantly improved, making it more reliable when using the multivariate Hawkes process compared to the univariate one.
  • A Hawkes graph can be generated from the parameter values obtained through the analysis using the multivariate Hawkes process. This facilitates an intuitive understanding of the relationships among the concepts that emerge in the document.
  • The importance of each concept identified in a text can be assessed using the optimized parameters of the multivariate Hawkes process. The most significant concepts suggested for each text analyzed in this study are confirmed to be valid when considering the content of the text.

4. Conclusions

The occurrence signals of important words in six texts (one historical pamphlet and five renowned academic books) were modeled using univariate and multivariate Hawkes processes. The modeling procedure for the univariate Hawkes process was conducted as follows. First, optimized parameter values for the univariate Hawkes process were determined using maximum likelihood estimation (MLE) based on the occurrence signals of a given word in a considered text. Next, word occurrence signals were generated by simulating the univariate Hawkes process with optimized parameters. Finally, the validity of the univariate Hawkes process modeling was confirmed by verifying that the four characteristic quantities derived from the simulated word occurrence signals closely matched those obtained from the real signals observed in the text.
The modeling procedure for the multivariate Hawkes process followed a similar approach. First, optimized parameters for the d -dimensional multivariate Hawkes process were evaluated using MLE based on the occurrence signals of d  important words, where d  represents the number of important words analyzed simultaneously with the multivariate Hawkes process. In this study, the typical value of d  was set to be 20. Next, occurrence signals of d  important words were simultaneously simulated using the multivariate Hawkes process with the optimized parameters. Finally, the validity of the multivariate Hawkes process modeling was confirmed by ensuring that the four characteristic quantities derived from the simulated word occurrence signals closely matched those obtained from the real signals observed in the text.
By validating the modeling with univariate and multivariate Hawkes processes, we observed that the correlation coefficients of the four characteristic quantities between real and simulated signals are notably closer to one in the case of the multivariate Hawkes process compared to the univariate one. This result demonstrates that the multivariate Hawkes process provides greater accuracy in describing word occurrence signals.
Additional advantages of the multivariate Hawkes process include its ability to intuitively represent the relationships among concepts described in a document through a Hawkes graph and to evaluate the importance of words in the text based on their relationships. For instance, the most important words in the analyzed documents were inferred using the optimized parameters of multivariate Hawkes processes, and these findings were confirmed to be valid.
A key limitation of this study is the relatively low dimensionality used in the multivariate Hawkes process. Specifically, we typically set the dimension to 20, which is insufficient to capture the mutual enhancement of word occurrences in large-scale documents. As previously discussed, this constraint stems from the impractical computational cost of maximum likelihood estimation (MLE) in dimensions beyond 20. Efficient computation using recursive algorithms [43] may help mitigate this issue, and future research exploring this direction would be a compelling subject of investigation. Other potential directions for future research utilizing the multivariate Hawkes process, which proved useful in this study, include:
  • Treating the matrix aij (defined by Equation (11)) as an adjacency matrix and applying graph theory methods, such as spectral clustering.
  • Analyzing texts using advanced stochastic processes, such as the autoregressive-type Hawkes process [44]—a discretized variant of the standard Hawkes model—which is particularly well-suited for high-dimensional analyses (e.g., with more than 20 dimensions) due to its lower computational complexity. An alternative promising framework is the “Flexible Triggering Kernels” model [45], which effectively encodes event history and captures localized excitation dynamics beyond traditional decay-based kernels.

Author Contributions

Conceptualization, H.O.; methodology, H.O.; software, H.O.; validation, Y.H., K.O. and M.K.; formal analysis, M.K.; investigation, H.O.; resources, H.O.; data curation, K.O.; writing—original draft preparation, H.O.; writing—review and editing, Y.H.; visualization, H.O. and K.O.; supervision, H.O.; project administration, H.O. and M.K.; funding acquisition, H.O. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JSPS KAKENHI, grant number 24K15198.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Datasets and the major source codes used in this study are available in the Open Science Framework repository at https://osf.io/6d8sf/ (accessed on 1 October 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
MLEMaximum Likelihood Estimation
ACFAutocorrelation Function
BICBayesian Information Criterion
KWWKohlrausch–Williams–Watts

References

  1. Pawlowski, A. Time-Series Analysis in Linguistics. Application of the Arima Method to Some Cases of Spoken Polish. J. Quant. Linguist. 1997, 4, 203–221. [Google Scholar] [CrossRef]
  2. Pawlowski, A. Language in the Line vs. Language in the Mass: On the Efficiency of Sequential Modelling in the Analysis of Rhythm. J. Quant. Linguist. 1999, 6, 70–77. [Google Scholar] [CrossRef]
  3. Pawlowski, A. Modelling of Sequential Structures in Text. In Handbooks of Linguistics and Communication Science; Walter de Gruyter: Berlin, Germany, 2005; pp. 738–750. [Google Scholar]
  4. Pawlowski, A.; Eder, M. Sequential Structures in “Dalimil’s Chronicle”; Mikros, G.K., Macutek, J., Eds.; Walter de Gruyter: Berlin, Germany, 2015; pp. 147–170. [Google Scholar] [CrossRef]
  5. Altmann, E.G.; Pierrehumbert, J.B.; Motter, A.E. Beyond word frequency: Bursts, lulls, and scaling in the temporal distributions of words. PLoS ONE 2009, 4, e7678. [Google Scholar] [CrossRef] [PubMed]
  6. Tanaka-Ishii, K.; Bunde, A. Long-range memory in literary texts: On the universal clustering of the rare words. PLoS ONE 2016, 11, e0164658. [Google Scholar] [CrossRef]
  7. Schenkel, A.; Zhang, J.; Zhang, Y. Long range correlation in human writings. Fractals 1993, 1, 47–57. [Google Scholar] [CrossRef]
  8. Ebeling, W.; Pöschel, T. Entropy and long-range correlations in literary English. Europhys. Lett. 1994, 26, 241. [Google Scholar] [CrossRef]
  9. Montemurro, M.A.; Pury, P.A. Long-range fractal correlations in literary corpora. Fractals 2002, 10, 451–461. [Google Scholar] [CrossRef]
  10. Alvarez-Lacalle, E.; Dorow, B.; Eckmann, J.P.; Moses, E. Hierarchical structures induce long-range dynamic correlations in written texts. Proc. Natl. Acad. Sci. USA 2006, 103, 7956–7961. [Google Scholar] [CrossRef]
  11. Altmann, E.G.; Cristadoro, G.; Esposti, M.D. On the origin of long-range correlations in texts. Proc. Natl. Acad. Sci. USA 2012, 109, 11582–11587. [Google Scholar] [CrossRef]
  12. Chatzigeorgiou, M.; Constantoudis, V.; Diakonos, F.; Karamanos, K.; Papadimitriou, C.; Kalimeri, M.; Papageorgiou, H. Multifractal correlations in natural language written texts: Effects of language family and long word statistics. Physica. A 2017, 469, 173–182. [Google Scholar] [CrossRef]
  13. Ogura, H.; Amano, H.; Kondo, M. Measuring Dynamic Correlations of Words in Written Texts with an Autocorrelation Function. J. Data Anal. Inf. Process 2019, 7, 46–73. [Google Scholar] [CrossRef]
  14. Ogura, H.; Amano, H.; Kondo, M. Origin of Dynamic Correlations of Words in Written Texts. J. Data Anal. Inf. Process. 2019, 7, 228–249. [Google Scholar] [CrossRef]
  15. Ogura, H.; Amano, H.; Kondo, M. Simulation of pseudo-text synthesis for generating words with long-range dynamic correlations. SN Appl. Sci. 2020, 2, 1387. [Google Scholar] [CrossRef]
  16. Ogura, H.; Hanada, Y.; Amano, H.; Kondo, M. A stochastic model of word occurrences in hierarchically structured written texts. SN Appl. Sci. 2022, 4, 77. [Google Scholar] [CrossRef]
  17. Ogura, H.; Hanada, Y.; Amano, H.; Kondo, M. Modeling Long-Range Dynamic Correlations of Words in Written Texts with Hawkes Processes. Entropy 2022, 24, 858. [Google Scholar] [CrossRef]
  18. Hawkes, A.G. Spectra of Some Self-Exciting and Mutually Exciting Point Processes. Biometrika 1971, 58, 83–90. [Google Scholar] [CrossRef]
  19. Ogata, Y. Statistical models for earthquake occurrences and residual analysis for point processes. J. Amer. Statist. Assoc. 1988, 83, 9–27. [Google Scholar] [CrossRef]
  20. Ogata, Y. Seismicity analysis through point-process modeling: A review. Pure Appl. Geophys. 1999, 155, 471–507. [Google Scholar] [CrossRef]
  21. Zhuang, J.; Ogata, Y.; Vere-Jones, D. Stochastic declustering of space-time earthquake occurrences. J. Amer. Statist. Soc. 2002, 97, 369–380. [Google Scholar] [CrossRef]
  22. Truccolo, W.; Eden, U.T.; Fellows, M.R.; Donoghue, J.P.; Brown, E.N. A Point Process Framework for Relating Neural Spiking Activity to SpikingHistory, Neural Ensemble, and Extrinsic Covariate Effects. J. Neurophysiol. 2005, 93, 1074–1089. [Google Scholar] [CrossRef] [PubMed]
  23. Reynaud-Bouret, P.; Rivoirard, V.; Tuleau-Malot, C. Inference of functional connectivity in Neurosciences via Hawkes processes. In Proceedings of the 1st IEEE Global Conference on Signal and Information Processing, Austin, TX, USA, 3–5 December 2013. [Google Scholar] [CrossRef]
  24. Gerhard, F.; Deger, M.; Truccolo, W. On the stability and dynamics of stochastic spiking neuron models: Nonlinear Hawkes process and point process GLMs. PLoS Comput. Biol. 2017, 13, e1005390. [Google Scholar] [CrossRef] [PubMed]
  25. Bacry, E.; Mastromatteo, I.; Muzy, J. Hawkes Processes in Market. Microstruct. Liq. 2015, 1, 1550005. [Google Scholar] [CrossRef]
  26. Rizoiu, M.A.; Lee, Y.; Mishra, S.; Xie, L. Hawkes processes for events in social media. In Frontiers of Multimedia Research, 1st ed.; Chang, S., Ed.; Association for Computing Machinery and Morgan & Claypool: New York, NY, USA, 2017; pp. 191–218. [Google Scholar] [CrossRef]
  27. Palmowski, Z.; Puchalska, D. Modeling social media contagion using Hawkes processes. J. Pol. Math. Soc. 2021, 49, 65–83. [Google Scholar] [CrossRef]
  28. Chiang, W.H.; Liu, X.; Mohler, G. Hawkes process modeling of COVID-19 with mobility leading indicators and spatial covariates. Int. J. Forecast. 2022, 38, 505–520. [Google Scholar] [CrossRef]
  29. Embrechts, P.; Liniger, T.; Lin, L. Multivariate Hawkes processes: An application to financial data. J. Appl. Probab. 2011, 48, 367–378. [Google Scholar] [CrossRef]
  30. Bowsher, C.G. Modelling security market events in continuous time: Intensity based, multivariate point process models. J. Econom. 2007, 141, 876–912. [Google Scholar] [CrossRef]
  31. Yang, S.Y.; Liu, A.; Chen, J.; Hawkes, A. Applications of a multivariate Hawkes process to joint modeling of sentiment and market return events. Quant. Finance 2018, 18, 295–310. [Google Scholar] [CrossRef]
  32. Zhou, K.; Zha, H.; Song, L. Learning Triggering Kernels for Multi-dimensional Hawkes Processes. In Proceedings of the 30th International Conference on Machine Learning, Proceedings of Machine Learning Research, Atlanta, GA, USA, 17–19 June 2013; Volume 28, pp. 1301–1309. Available online: https://proceedings.mlr.press/v28/zhou13.html (accessed on 25 September 2025).
  33. Luo, D.; Xu, H.; Zhen, Y.; Ning, X.; Zha, H.; Yang, X.; Zhang, W. Multi-task multi-dimensional Hawkes processes for modeling event sequences. In Proceedings of the 24th International Conference on Artificial Intelligence 2015, Buenos Aires, Argentina, 25–31 July 2015; pp. 3685–3691. Available online: https://hdl.handle.net/1805/10142 (accessed on 25 September 2025).
  34. Embrechts, P.; Kirchner, M. Hawkes graphs. Theory Probab. Its Appl. 2018, 62, 132–156. [Google Scholar] [CrossRef]
  35. Osman, A.H.; Barukub, O.M. Graph-Based Text Representation and Matching: A Review of the State of the Art and Future Challenges. IEEE Access 2020, 8, 87562–87583. [Google Scholar] [CrossRef]
  36. Edited by Segev, E. Semantic Network Analysis in Social Sciences, 1st ed.; Taylor & Francis: New York, NY, USA, 2022. [Google Scholar]
  37. Laub, P.J.; Taimre, T.; Pollett, P.; Taimre, T. Hawkes Processes. arXiv 2015, arXiv:1507.02822. [Google Scholar] [CrossRef]
  38. Laub, P.J.; Lee, Y.; Pollett, K.P.; Taimre, T. Hawkes Models and Their Applications. arXiv 2024, arXiv:2405.10527. [Google Scholar] [CrossRef]
  39. Lindström, A.; Lindgren, S.; Sainudiin, R. Hawkes Processes on Social and Mass Media, International Conference on Data Technologies and Applications 2023. Available online: https://uu.diva-portal.org/smash/get/diva2:1827189/FULLTEXT01.pdf (accessed on 25 September 2025).
  40. Daley, D.J.; Vere-Jones, D. An Introduction to the Theory of Point Processes: Volume I: Elementary Theory and Methods, 2nd ed.; Springer-Verlag: New York, NY, USA, 2003. [Google Scholar]
  41. Luo, R.; Krishnamurthy, V.; Blasch, E. Hawkes Process Modeling of Block Arrivals in Bitcoin Blockchain. arXiv 2022, arXiv:2203.16666. [Google Scholar] [CrossRef]
  42. Nho, J.H.; Park, S. Research trends in the Korean Journal of Women Health Nursing from 2011 to 2021: A quantitative content analysis. Korean J. Women Health Nurs. 2023, 29, 128–136. [Google Scholar] [CrossRef]
  43. Brisley, T.; Ross, G.; Paulin, D.; Easto, J. Estimation of Multivariate Discrete Hawkes Processes: An Application to Incident Monitoring. arXiv 2023, arXiv:2305.20085. [Google Scholar] [CrossRef]
  44. Chen, S.; Shojaie, A.; Shea-Brown, E.; Witten, D. The multivariate Hawkes process in high dimensions: Beyond mutual excitation. arXiv 2017, arXiv:1707.04928. [Google Scholar] [CrossRef]
  45. Isik, Y.A.; Chapfuwa, P.; Davis, C.; Henao, R. Flexible Triggering Kernels for Hawkes Process Modeling. Proc. Mach. Learn. Res. 2023, 219, 1–17. Available online: https://proceedings.mlr.press/v219/isik23a/isik23a.pdf (accessed on 25 September 2025).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.