Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management

Zhang, Yantao; Li, Yangyang; Wang, Shuxin; Wang, Jingxuan; Yasrab, Robail; Wu, Xinli

doi:10.3390/a19050408

Open AccessArticle

Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management

by

Yantao Zhang

^1,†,

Yangyang Li

^1,†,

Shuxin Wang

¹

,

Jingxuan Wang

^2,*,

Robail Yasrab

³

and

Xinli Wu

^1,*

¹

School of Medical Information, Wannan Medical University, Wuhu 241000, China

²

School of Public Health, Anhui Medical University, Hefei 230032, China

³

MRC Biostatistics Unit, University of Cambridge, Cambridge CB2 1TN, UK

^*

Authors to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Algorithms 2026, 19(5), 408; https://doi.org/10.3390/a19050408

Submission received: 6 April 2026 / Revised: 7 May 2026 / Accepted: 14 May 2026 / Published: 19 May 2026

(This article belongs to the Topic Applications of NLP, AI, and ML in Software Engineering)

Download

Browse Figures

Versions Notes

Abstract

Accurate prediction of invasive species diffusion is essential for effective management and ecological conservation. Existing spatio-temporal Cox process models face limitations due to the separability assumption, which fails to capture spatio-temporal coupling dynamics inherent in biological diffusion processes. This study proposes a Spatio-Temporal Interaction Kernel Cox (STIK-Cox) model that constructs a non-separable conditional intensity function integrating baseline intensity, spatial and temporal proximity kernels, seasonal fluctuation, and a spatio-temporal interaction term. The model employs maximum likelihood estimation with Limited-memory Broyden–Fletcher–Goldfarb–Shanno with Bounds (L-BFGS-B) optimisation and incorporates SHapley Additive exPlanations (SHAP) for interpretability analysis. Using the Vespa mandarinia (Hymenoptera, Vespidae) monitoring dataset from Washington State, the model achieves a comprehensive accuracy score of 0.957, a capture rate of 98.74% at a 0.5° threshold, and a mean prediction error of 0.0802°. K-function analysis confirms effective capture of spatial clustering patterns, while SHAP analysis reveals longitude as the primary predictive driver. The non-separable design outperforms conventional methods including inverse distance weighting and Poisson point processes. This framework demonstrates the potential of non-separable spatio-temporal point processes for invasive species early warning, providing a scientific basis for targeted monitoring and resource allocation in ecological management.

Keywords:

spatio-temporal Cox process; non-separable kernels; spatio-temporal interaction; Vespa mandarinia; diffusion forecasting

1. Introduction

The management of invasive species and ecological conservation are key areas for safeguarding biodiversity and agricultural security [1,2]. As globalisation accelerates, the risk of invasive species is increasing [3]. Notably, V. mandarinia was first detected in North America in 2019 [4,5], and this invasion poses a serious threat to local ecosystems, particularly to bee populations [6]. V. mandarinia exhibits distinctive ecological behaviour: the queen selects underground burrows for settlement. This subterranean nesting habit makes V. mandarinia nests difficult to detect, thereby increasing the difficulty of monitoring and control. Subsequently, new clusters form through nest-based dispersal, spreading from the nest to surrounding areas, and their activity exhibits distinct seasonality, nesting in spring and dispersing in autumn [7]. Historical sighting reports collected in Washington State indicate that events exhibit non-random clustering characteristics in both time and space, providing a biological basis for modelling.

Accurately predicting their diffusion is a prerequisite for formulating effective control strategies [8,9]. However, given the practical constraints of limited monitoring resources, how to achieve high-precision, interpretable diffusion predictions based on temporally sparse and spatially clustered sighting data remains a key challenge [10]. Traditional monitoring methods rely on manual patrols and public reports. However, these methods often have limited coverage and struggle to capture the full spatial distribution of species. This is particularly true during the early stages of an invasion, when population densities are low and sightings are sparse and irregularly distributed, presenting significant difficulties for predicting diffusion trends. Traditional spatial interpolation or kernel density estimation methods [11,12] struggle to capture the spatio-temporal dependencies between events and fail to incorporate ecological mechanisms into the prediction process; meanwhile, complex black-box models, whilst capable of fitting the data, lack interpretability and thus cannot effectively guide practical control decisions [13]. In recent years, interpretable artificial intelligence technologies such as SHAP have provided new approaches to addressing the transparency issues of black-box models [14]. Based on the Shapley value from cooperative game theory, SHAP values quantify the marginal contribution of each feature to model predictions and have achieved significant results in fields such as medical diagnosis and financial risk control. However, research combining SHAP with spatio-temporal point process models remains scarce.

To overcome the aforementioned limitations, this study proposes a spatio-temporal interactive joint kernel Cox model. The core innovation of this model lies in the construction of a spatio-temporal joint kernel function to replace the simple product of traditional separable kernels, thereby capturing the inherent spatio-temporal coupling effects in biological diffusion processes. Specifically, traditional separable models assume that spatial distance and temporal distance are independent of one another, thereby neglecting the synergistic patterns of change in invasion events across both spatial and temporal dimensions; the interaction term introduced in this study explicitly models the synergistic effects between spatial and temporal kernel values through a parametric approach, enabling the model to characterise the cumulative impact intensity of historical sighting events on neighbouring spatio-temporal regions. The model employs a log-likelihood function for parameter estimation [15] and is solved using the L-BFGS-B optimisation algorithm. The magnitude of the spatio-temporal interaction coefficient

ρ

reflects the intensity of the spatio-temporal synergy, providing interpretable quantitative evidence for understanding the ecological mechanisms of invasive species.

The experimental results indicate that the constructed model is capable of effectively capturing the spatio-temporal clustering characteristics of invasion events, demonstrating high reliability in predicting future sighting locations, as measured by capture rate (98.74% within 0.5°), mean prediction error (0.0802°), K-function correlation coefficient (0.9879), and overall accuracy score (0.957), with overall predictive accuracy superior to that of conventional spatio-temporal prediction methods. The main contributions of this study include:

Proposing a framework for a non-separable spatio-temporal joint kernel Cox point process, which breaks the traditional assumption of spatio-temporal independence by incorporating spatio-temporal interaction terms, thereby mathematically characterising spatio-temporal coupling effects and effectively capturing the spatiotemporally coupled clustering characteristics of the progressive diffusion of invasive species;
Introducing SHAP-based model interpretation to reveal the contribution mechanisms of individual features to intensity predictions, as well as their variation patterns under different spatio-temporal conditions.

2. Related Works

In recent years, spatio-temporal point process models have been widely applied in fields such as ecology and epidemiology, providing powerful statistical tools for understanding the mechanisms of clustering and dispersion of events across spatio-temporal dimensions [16]. The core concept of point process theory is to view the occurrence of random events as a realisation of a Poisson process, using a conditional intensity function to describe the probability of events occurring in space and time. The advantage of this modelling framework lies in its solid theoretical foundation and its ability to naturally handle the spatial and temporal dependencies of events, problems that traditional regression methods struggle to resolve. Depending on the specification of the random intensity function, spatio-temporal Cox processes can be classified into two categories: log-Gaussian Cox processes and shot-noise Cox processes. Among these, the shot-noise Cox process, which superimposes the contribution of historical events onto the background intensity, is particularly suitable for characterising clustering patterns driven jointly by environmental heterogeneity and past events [17]. Møller and Díaz-Avalos [18] proposed a structured spatio-temporal shot-noise Cox process model and applied it to the analysis of forest fire data, demonstrating the model’s flexibility in characterising clustering patterns driven by both environmental heterogeneity and historical events. Møller and Waagepetersen [19] systematically elaborated on the theoretical framework for statistical inference, providing methodological support for likelihood estimation in point process models.

Common methods for estimating parameters of the intensity function in point process models include maximum likelihood estimation, least squares estimation and Bayesian inference [20,21]; maximum likelihood estimation obtains parameter estimates by maximising the log-likelihood function and possesses good asymptotic properties. Rathbun first established the theoretical framework for maximum likelihood estimation of spatio-temporal point processes, proving the consistency and asymptotic normality of parameter estimates under regularisation conditions [22]. Ogata developed a log-likelihood function based on conditional intensity for self-excited point processes, laying the foundation for MLE calculations in point processes [23]. With regard to optimisation algorithms, as the log-likelihood function of point processes is typically non-convex and possesses multiple local extrema, the selection of an appropriate optimisation algorithm is of paramount importance. The L-BFGS-B algorithm proposed by Byrd et al. approximates the Hessian matrix using a quasi-Newton method whilst supporting parametric boundary constraints, making it particularly well-suited to the non-negative constraint problems commonly encountered in point process parameter estimation [24].

In the field of spatio-temporal modelling, Cressie et al. [25] systematically discussed the limitations of separable covariance functions, pointing out that they fail to capture spatio-temporal interaction effects. Møller [26] established the foundational theoretical framework for shot-noise Cox processes, providing rigorous mathematical properties for this class of models. This research laid the groundwork for the subsequent introduction of Cox processes into ecological dispersal problems. Baddeley et al. [12] described a variety of spatial statistical methods, including the K-function and the L-function. By comparing the difference between observed data points and a complete spatial random process, the K-function can effectively test for clustering in events; this method has become a commonly used tool for validating point process models.

With regard to trajectory prediction and model accuracy evaluation, Mohamed et al. [27] proposed an implicit maximum likelihood estimation framework for trajectory prediction in dynamic environments and discussed evaluation metrics for prediction errors, thereby providing a reference for measuring the performance of spatio-temporal prediction models.

Based on the theoretical review and methodological analysis outlined above, it is evident that existing research has provided a solid mathematical foundation and a wealth of technical approaches for modelling spatio-temporal point processes. However, in the specific application of dynamic monitoring of invasive species, the question of how to construct predictive models that possess both explanatory power and generalisability, whilst operating under the practical constraints of limited data collection and complex ecological mechanisms, remains a topic worthy of further exploration. This study takes the invasion and diffusion of the Asian giant hornet as its empirical subject. By integrating maximum likelihood estimation, kernel methods and statistical validation techniques, it attempts to quantitatively characterise the species’ dispersal patterns and provide risk warnings within the framework of spatio-temporal Cox processes.

3. Methods

3.1. Model Parameters

All model parameters, including their symbols, physical meanings, and units, are summarized in Table 1.

3.2. Definition

Let the study area be the spatial domain

S \subset R^{2}

(corresponding to the latitude and longitude range of the State of Washington), and the study period be the temporal domain

T \subset R^{+}

, spanning from the date of the first detection to the forecast horizon. Let

(s_{i}, t_{i})

denote the spatio-temporal coordinates of the

i

-th sighting event, where

s_{i}

is the spatial position and

t_{i}

is the time of the sighting. The set of observed sightings constitutes a spatio-temporal point process

{(s_{i}, t_{i})}_{i = 1}^{n}

, where

n

is the total number of historical sightings.

In the spatio-temporal point process modelling framework, the core concept is the conditional intensity function

Q (s, t)

, which represents the instantaneous probability density of an observation occurring at location

s

and time

t

, given the historical event information.

This study adopts the spatio-temporal Cox process framework, assuming the existence of a non-negative stochastic intensity process

Λ (s, t)

, whose stochasticity arises from factors such as underlying environmental heterogeneity, unobserved covariates and measurement errors, reflecting the inherent uncertainty of the intensity function itself. Given a realisation of this process, observed events follow a non-homogeneous Poisson distribution in space and time. In this case, the conditional intensity function equals this realisation, i.e.,

Q (s, t) = Λ (s, t)

. In this study,

Λ (s, t)

is parameterised as a deterministic structure; consequently, the conditional intensity function

Q (s, t)

is used directly in subsequent modelling.

Firstly, the traditional separable spatio-temporal Cox model assumes that spatial and temporal effects are independent of one another; its conditional intensity function

Q_{s e p} (s, t)

is given by Equation (1):

Q_{s e p} (s, t) = λ_{0} \times K_{s} \times K_{t} \times g (t),

(1)

where

λ_{0}

represents the baseline intensity parameter,

g (t)

is the seasonal term,

K_{s}

is the spatial kernel, and

K_{t}

is the temporal kernel. The spatial kernel

K_{s}

and the temporal kernel

K_{t}

contribute independently to the predicted intensity and are unable to capture the synergistic changes between spatial and temporal proximity. However, the diffusion of invasive species often exhibits spatio-temporal coupling characteristics, meaning that spatially proximate events also tend to be temporally proximate; this progressive diffusion pattern cannot be effectively characterised by separable models. Therefore, this paper proposes a non-separable model that primarily captures the coupling characteristics between spatial and temporal proximity through the spatio-temporal interaction term

h (s, t)

. Specifically, the conditional intensity function

Q (s, t)

is defined in the following product form, as shown in Equations (2)–(5):

Q (s, t) = λ_{0} \times K_{j o i n t} \times g (t),

(2)

K_{j o i n t} = K_{s} \times K_{t} \times h (s, t),

(3)

K_{s} = \frac{1}{n} \sum_{i = 1}^{n} e x p (- \frac{{(x - x_{i})}^{2}}{2 σ_{x}^{2}} - \frac{{(y - y_{i})}^{2}}{2 σ_{y}^{2}})

(4)

K_{t} = \frac{1}{n} \sum_{i = 1}^{n} \exp (- \frac{d_{t}^{2}}{2 σ_{t}^{2}}),

(5)

where

λ_{0}

is independent of the spatio-temporal kernel function and seasonal modulation effects; it is obtained through maximum likelihood estimation in conjunction with the optimisation of other model parameters, and determines the overall order of magnitude of the intensity function.

K_{j o i n t}

represents the joint kernel, defined as the product of the spatial kernel, the temporal kernel and the interaction term. It integrates spatial proximity, temporal proximity and their interaction effects into a unified kernel function form, and is a core component of the conditional intensity function.

K_{s}

employs an anisotropic Gaussian kernel to characterise the influence of historical sighting locations on the surrounding areas. Let

x_{i}

and

y_{i}

denote the longitude and latitude of the

i

-th historical observation event, and

x

and

y

denote the coordinates of the forecast point. Where

σ_{x}

and

σ_{y}

are the spatial scale parameters in longitudinal and latitudinal directions, respectively, controlling the range of spatial influence attenuation in each dimension.

K_{t}

employs a Gaussian kernel to characterise the influence of historical sighting times on the prediction time point.

d_{t} = | t_{n o r m} - {t^{'}}_{i, n o r m} |

represents the normalised temporal distance between the prediction time point

t_{n o r m}

and the

i

-th historical sighting time point

{t^{'}}_{i, n o r m}

; time is mapped to the interval [0, 1] via a max–min normalisation.

σ_{t}

is the temporal scale parameter, controlling the decay rate of the temporal influence.

In the

K_{j o i n t}

,

h (s, t)

, which is designed in this paper to challenge the traditional assumption of separability, serves as the spatio-temporal interaction term and constitutes the key innovation of the model. Its core function lies in capturing the intrinsic coupling between spatial and temporal proximity, thereby enabling the model to depict how the intensity of historical events’ impact on neighbouring spatio-temporal regions interacts synergistically as it varies across time and space, as shown in Equation (6):

h (s, t) = c l i p (\exp (ρ \times K_{s} \times K_{t} \times α), 0.5,3.0),

(6)

where

ρ

represents the spatio-temporal interaction coefficient, and

α

represents the interaction strength coefficient. When both spatial and temporal proximity are present, the predictive power is further enhanced. Specifically,

ρ > 0

indicates the presence of a spatio-temporal synergy; the larger the value of

ρ

, the stronger the synergy. When

ρ = 0

, the model degenerates into a separable model, with spatio-temporal effects being independent. Furthermore, to ensure numerical stability, the interaction term is constrained within the interval [0.5, 3.0]. This serves primarily as a numerical regularisation strategy to ensure the stability of the likelihood function optimisation process, preventing parameters from becoming excessively large, which could result in the model failing to recognise patterns or the optimisation process becoming unstable. Specifically, the exponential term is constrained using the clip function

c l i p ()

, if the calculated value is less than 0.5, it is set to 0.5; if it is greater than 3.0, it is set to 3.0.

Furthermore, to capture the seasonal patterns of sightings and reflect the cyclical fluctuations in the activity of V. mandarinia throughout the year, namely, increased activity during the spring nesting period and reduced activity during the winter dormancy period, this paper introduces a seasonal term

g (t)

, as shown in Equation (7):

g (t) = 0.5 + 0.45 \times s i n (ω t + φ),

(7)

where

t

is the original number of days,

ω

is the seasonal frequency parameter, and

φ

is the phase shift parameter, which adjusts the temporal position of the seasonal peaks and troughs.

In summary, the key distinction between this formula and traditional separable models lies in the introduction of the spatio-temporal interaction term

h (s, t)

. In traditional separable models, the spatial component

K_{s}

and the temporal component

K_{t}

appear independently in the product; whereas in the model presented in this study,

h (s, t)

couples

K_{s}

and

K_{t}

in a non-linear manner, such that the predicted intensity cannot be decomposed into independent spatial and temporal components.

3.3. Model Structure

This paper proposes the STIK-Cox model, a spatio-temporal point process framework designed for predicting the diffusion of invasive species. By constructing a non-separable conditional intensity function, the model effectively captures the inherent spatio-temporal coupling dynamics inherent in biological invasions. To enhance the model’s transparency, this study further introduces the SHAP interpretability analysis method to quantify the marginal contribution of each feature to intensity predictions.

This approach aligns with ecological principles, namely that the dispersal patterns of invasive species exhibit intrinsic spatio-temporal coupling characteristics, whereby spatial proximity and temporal proximity are not independent of one another but jointly influence the intensity of sighting events.

Primarily, the model parameters are initialised based on domain knowledge. Specifically, the baseline intensity parameter

λ_{0}

is initialised to 0.5, representing the average event occurrence rate. The spatial scale parameter

σ_{x}

and

σ_{y}

are initialised to 0.3°, reflecting a typical spatial extent of influence. The temporal scale parameter

σ_{t}

is initialised to 0.1, controlling the decay rate of the temporal influence. The spatio-temporal interaction coefficient

ρ

is initialised to 0.1, representing the initial strength of the spatio-temporal synergy. The interaction strength coefficient

α

is initialised to 5.0, controlling the sensitivity of the interaction term. The seasonal frequency

ω

is initialised to

2 π / 365

, corresponding to the annual cycle, and the phase shift

φ

is initialised to 0.

In addition, the model incorporates a SHAP module, which uses a background dataset sampled from the training data to provide baseline predictions for calculating feature contributions. The background dataset typically comprises 200 randomly selected samples to ensure computational efficiency whilst maintaining representativeness.

During the training phase, the model follows an iterative optimisation process. For each data point

(x, y, t)

, the spatial distance

d_{s}

and temporal distance

d_{t}

from all historical events are calculated. Subsequently, the spatial kernel

K_{s}

, temporal kernel

K_{t}

and interaction term

h (s, t)

are computed to form the joint kernel. The intensity function

Q (s, t)

is calculated as the product of the baseline intensity, the mean of the joint kernel and the seasonal modulation term.

To estimate the model parameters, the algorithm minimises the negative log-likelihood function, with the objective function shown in Equation (8):

L (Θ) = - \sum_{i = 1}^{n} \log Q (s_{i}, t_{i}; Θ) + | S | \times T \times \frac{1}{N} \sum_{j = 1}^{N} Q (s_{j}, t_{j}; Θ)

(8)

where

Θ = {λ_{0}, σ_{x}, σ_{y}, σ_{t}, ρ, ω, φ, α}

denotes the parameter set,

Q (s_{i}, t_{i}; Θ)

denotes the predicted intensity value at spatial location

s_{i}

and time

t_{i}

under the parameter set

Θ

,

n

is the number of observations, and

T

is the total time span. The second term employs Monte Carlo integration to approximate the integral of the intensity function across the spatio-temporal domain. Specifically,

N = 200

sample points are randomly drawn from the spatial domain

S

and temporal domain

T

, and the intensity

Q (s_{j}, t_{j}; Θ)

is computed at each sample point. The integral is estimated as the mean intensity multiplied by the domain volume

| S | \times T

, where

| S |

denotes the area of spatial domain

S

.

The key innovation of the proposed model lies in the spatio-temporal interaction term

h (s, t)

, which captures the coupling effect between spatial and temporal proximity. The larger the value of

ρ

, the stronger the spatio-temporal synergy; when

ρ = 0

, the model degenerates into a separable form.

Once parameter estimation is complete, the SHAP analysis module calculates the feature importance scores for each sample. The SHAP value

ϕ_{k}

for feature

k

is computed using the Shapley value formula, as shown in Equation (9):

ϕ_{k} = \sum_{S \subseteq N ∖ {k}} \frac{| S |! (| N | - | S | - 1)!}{| N |!} [f (S \cup {k}) - f (S)],

(9)

where

N

represents the set of all features,

S

denotes a subset of features that excludes feature

k

,

| S |

denotes the size of subset

S

,

f (S)

represents the model’s prediction using only the features in subset

S

, and

f (S \cup {k}) - f (S)

denotes the marginal contribution of feature

k

. This formula evaluates the marginal contribution of each feature across all possible feature subsets, and the global feature importance is derived from the average absolute SHAP value across all samples.

For the prediction task, the model generates a spatial grid covering the study area and calculates the intensity

Q (s, t)

at each grid point. Predicted sighting locations are sampled probabilistically according to the normalised intensity distribution, ensuring that high-intensity regions have a higher selection probability whilst maintaining spatial coverage. The proposed model architecture is shown in Figure 1.

4. Experiments

We conduct comprehensive experiments on datasets to validate the model’s effectiveness, comparing it against other baselines across multiple metrics including comprehensive accuracy score, capture rate and others. Ablation studies further confirm the contribution of each module, while robustness analysis reveals robust performance.

4.1. Software Implementation

All analyses and model implementations in this study are conducted in Python 3.12.4. The STIK-Cox model, parameter estimation procedure, Monte Carlo integration, residual analysis, SHAP analysis, and all evaluation metrics are implemented using custom Python scripts. Numerical computation is performed using NumPy 1.26.4 and SciPy 1.17.1, data processing is conducted using pandas 3.0.1, model evaluation is implemented using scikit-learn 1.8.0, and all figures are generated using Matplotlib 3.10.8. The L-BFGS-B algorithm implemented in SciPy is used for constrained parameter optimization. No R packages, including the R package spatstat, are used to generate the results reported in this manuscript. Reference [12] is cited only for its theoretical relevance to spatial point process analysis, rather than for software implementation. The computations are performed on a workstation equipped with an Intel Core i7-13650HX processor and 16 GB RAM.

4.2. Data Preparation and Preprocessing

The dataset used in this study is obtained from the 2021 Mathematical Contest in Modeling (MCM) Problem C, and it contains public records of V. mandarinia sighting reports from the Washington State Department of Agriculture (WSDA). Each record includes geographic coordinates (latitude and longitude) indicating the location of the observed specimen, detection date recording when the sighting occurred, and lab verification status classified as Positive ID (confirmed by laboratory analysis), Unprocessed (pending verification), or Unverified (reported but not yet confirmed). Sightings are reported by both citizens and professional surveyors through the WSDA reporting system.

To ensure data quality and the effectiveness of model training, this study designs an automated pre-processing workflow, comprising steps such as data cleaning, outlier handling, feature normalisation, and spatio-temporal statistical analysis.

Firstly, the raw data is cleaned and filtered. This study retains records with the status “Positive ID”, “Unprocessed” and “Unverified”. Subsequently, records lacking key information (longitude, latitude, and detection date) are removed, and detection dates are converted to standard time format. To exclude early inaccurate reports, only sightings occurring on or after 1 January 2019 are retained. Furthermore, for duplicate records with identical latitude and longitude coordinates, only the first occurrence is retained to avoid bias caused by spatial clustering. Finally, based on the geographical scope of the study area, sightings are filtered to include those with longitudes in the range [−125.0°, −116.0°] and latitudes in the range [45.0°, 49.5°]. Following the above pre-processing, a total of 2221 valid sighting records are obtained, spanning a period of 670 days. To prevent scale bias arising from features of different orders of magnitude during model training, this study employs the max-min normalisation method to process the time variable, as shown in Equation (10):

t_{n o r m} = \frac{t - t_{m i n}}{t_{m a x} - t_{m i n}},

(10)

where

t_{m i n}

represents the number of days corresponding to the earliest detection date,

t_{m a x}

represents the number of days corresponding to the latest detection date, and

t_{n o r m}

is the normalised time value, with a range of [0, 1]. This normalisation ensures that the temporal and spatial scales are comparable, preventing excessively large time values from dominating the optimisation process.

Subsequently, this paper conducts a comprehensive spatio-temporal statistical analysis of sightings. As shown in Figure 2a, the graph displays the number of observations aggregated by year and month. These data indicate that the number of sightings exhibits distinct seasonal fluctuations over time. The horizontal axis represents time, whilst the vertical axis represents the number of observations. The peak period occurs in summer, followed by a gradual decline. This pattern is consistent with the life cycle of V. mandarinia: spring is the critical period when queen wasps emerge from hibernation to establish nests; summer is the season when colonies expand rapidly with hundreds to thousands of worker wasps actively foraging; late summer to early autumn is when new queens and males are produced and disperse; winter marks the overwintering period with minimal activity. As shown in Figure 2b, sightings exhibit a distinct non-random clustered distribution in space. The horizontal axis represents longitude, the vertical axis represents latitude, and different colours indicate different months. The results show that a large number of sightings are concentrated in a core area between longitudes −123° and −122° and latitudes 47° and 48.5°, forming a dense spatial cluster. As shown in Figure 2c, the kernel density estimate heatmap transitions from yellow (low density) to deep red (high density), revealing spatial clustering hotspots. These clustering patterns indicate that the diffusion of invasive species is not random but is influenced by a combination of factors including geographical environment, climatic conditions and ecological suitability.

To verify the spatio-temporal clustering of the data, this paper employs spatial statistical methods suitable for sparse point process data. Given that approximately 56% of the days in this dataset are zero-event days, traditional time series autocorrelation tests (such as the Ljung–Box test) are not applicable. Consequently, this study employs Ripley’s K-function

K_{a} (r)

, the L-function

L (r),

and the temporal K-function

K_{d} (t)

to analyse the spatio-temporal clustering characteristics of the data.

K_{a} (r)

is used to measure the degree of point density clustering within a spatial distance

r

;

L (r)

, as a linearised transformation of

K_{a} (r)

, is used to eliminate boundary effects and identify clustering scales more intuitively; whilst

K_{d} (t)

is used to measure the degree of event clustering within a time lag

t

.

As shown in Figure 3a,

K_{a} (r)

is significantly higher than the 95% confidence upper limit under the assumption of complete spatial randomness (CSR), and consistently lies above the theoretical CSR curve, indicating that sightings exhibit significant spatial clustering. As shown in Figure 3b, the maximum value of

L (r)

is 1.0475, corresponding to a spatial scale of approximately 0.43°. The red shaded region marks the area of significant clustering

L (r) > 0

and exceeding the 95% confidence upper limit, confirming the statistical significance of the spatial clustering. As shown in Figure 3c, the

K_{d} (t)

curve consistently exceeds the theoretical value under the assumption of complete temporal randomness (CTR) and lies entirely outside the 95% confidence upper limit, indicating that sightings also exhibit significant clustering in the temporal dimension; that is, events tend to concentrate within specific time periods rather than being uniformly distributed across the entire study period. These results validate the necessity of constructing a spatio-temporal point process model and support the hypothesis of the spatio-temporal interaction joint kernel Cox model function, that spatial proximity and temporal proximity jointly influence the intensity of sighting events.

To rigorously validate the model fit, this paper employs a thinning-based residual analysis. This method transforms the observed point pattern into a thinned residual process that should approximate a homogeneous Poisson process if the model adequately captures the clustering structure, and then computes the spatial residual K-function

K_{g} (r)

and the temporal residual K-function

K_{h} (t)

on the thinned residuals, constructing 95% confidence envelopes using 99 CSR simulations.

As shown in Figure 4a, the

K_{g} (r)

curve falls entirely within the 95% CSR envelope, indicating that the spatial residuals exhibit patterns consistent with complete spatial randomness. As shown in Figure 4b, the

K_{h} (t)

curve similarly lies within the 95% CSR envelope, confirming that the temporal residuals are uniformly distributed across the study period. These results demonstrate that the thinned residuals approximate a homogeneous Poisson process in both spatial and temporal dimensions, providing strong statistical evidence that the fitted spatio-temporal Cox model adequately captures the clustering structure in the observed data.

4.3. The Model Training

In the Cox model for spatio-temporal joint kernels, the model employs maximum likelihood estimation for parameter optimisation, using the L-BFGS-B algorithm to determine the optimal parameter values subject to parameter constraints.

To improve training efficiency, this study introduces a domain-knowledge-based parameter initialisation strategy and an adaptive boundary constraint mechanism, whereby parameters are restricted to physically reasonable ranges during the optimisation process.

λ_{0} \in [0.1, 3]

ensures that the reference intensity remains at a reasonable level;

σ_{x} \in [0.15, 0.6]

and

σ_{y} \in [0.15, 0.6]

ensure that the spatial influence ranges correspond to the typical diffusion distance of V. mandarinia.

σ_{t} \in [0.05, 0.25]

controls the temporal decay rate;

ρ \in [0, 1]

ensures the non-negativity and stability of the interaction terms;

α \in [1.0, 15.0]

regulates the interaction sensitivity;

ω

is fixed at

2 π / 365

to correspond to the annual cycle;

φ \in [- π, π]

covers the full seasonal cycle.

This paper uses latitude and longitude coordinates and normalised time as input features, with the intensity of observed events

Q (s, t)

as the target variable. To ensure that the input features and the target variable are within the same scale range, this paper employs the max-min normalisation method to map the time variable to the interval [0, 1]. Spatial coordinates retain their original units of measurement to preserve the physical significance of geographical distances. The data is divided into a training set and a test set; the training set is used to learn spatio-temporal patterns, whilst the test set is used to validate predictive capabilities.

Model training employs the negative log-likelihood as the objective function, which exhibits favourable statistical properties in point process models. Parameter optimisation uses the L-BFGS-B algorithm, which supports parameter boundary constraints and offers high computational efficiency, making it suitable for optimisation problems in parameter spaces of moderate size. To avoid local optima, this study devises a strategy involving multiple sets of initial values, comprising five distinct parameter initialisation configurations. The initial loss function value for each configuration is calculated and evaluated; optimisation is only performed when the initial loss value falls within a reasonable range. During the optimisation process, the algorithm iteratively updates the parameters until the convergence criteria are met.

During training, the spatio-temporal interactive joint kernel Cox model gradually optimises the parameters of the conditional strength function by maximising the log-likelihood function. Once training is complete, the SHAP analysis module calculates feature importance scores based on the Kernel SHAP method. A background dataset is constructed by randomly sampling 200 instances from the training data to provide baseline predictions. For each instance to be interpreted, the SHAP value is obtained by traversing all possible feature subsets and calculating marginal contributions, with the Shapley values for each feature ultimately aggregated according to Shapley weights. Global feature importance is derived from the average absolute Shapley value across all instances, revealing the contribution mechanisms of each feature to the intensity prediction.

The experimental results indicate that the model parameter estimates are stable. The estimated value of the spatio-temporal interaction coefficient

ρ

is approximately 0.6, suggesting a positive spatio-temporal synergy, which is consistent with the biological behaviour of nest centre expansion. The estimated values of the spatial scale parameters

σ_{x}

and

σ_{y}

are approximately 0.35° and 0.30°, respectively, reflecting the typical activity range of V. mandarinia. The seasonal term parameter

ω

is fixed to an annual cycle, whilst the estimated value of the phase shift

φ

modulates the temporal position of seasonal peaks and troughs. The model demonstrates good predictive performance on the test set, effectively capturing the spatio-temporal clustering characteristics of invasion events.

5. Results

To evaluate model performance, this paper employs a validation method that separates the training and test sets, dividing the data into a training set and a test set, and conducts 50 experiments using repeated random sampling. The training set is used for model parameter estimation, whilst the test set is used to verify the model’s generalisation ability. This validation approach avoids the issue of data leakage that arises from using the entire dataset for both training and evaluation simultaneously, thereby ensuring the reliability and authenticity of the evaluation results. The metrics used to evaluate model performance include capture rate, mean prediction error, the K-function correlation coefficient, and the comprehensive accuracy score. The K-coefficient assesses the similarity between the predicted distribution and the actual distribution in terms of spatial patterns; the capture rate measures the proportion of predicted points falling within a specific neighbourhood of actual sightings; the mean prediction error reflects the average deviation between the predicted location and the nearest actual sighting point; and the comprehensive accuracy score integrates the aforementioned metrics to provide an overall evaluation. The formula for the comprehensive accuracy score is shown in Equation (11):

O = 0.45 \times D_{d i s t} + 0.10 \times A_{K} + 0.45 \times C_{c a p},

(11)

where

O

represents the comprehensive accuracy score.

D_{d i s t}

represents the distance accuracy component, which measures the average proximity between predicted points and actual points.

A_{K}

represents the spatial pattern similarity component, which measures the consistency between the predicted distribution and the actual distribution in terms of spatial clustering characteristics.

C_{c a p}

represents the capture rate score component, which measures the model’s ability to capture actual points under multi-scale distance thresholds. This scoring method comprehensively evaluates the model’s location accuracy, spatial distribution consistency and multi-scale capture capability. The score ranges from [0, 1], with higher values indicating better predictive performance. Furthermore, the weighting coefficients are set according to the actual importance of each metric. Distance accuracy and capture rate, as core metrics, are assigned higher weights, each set at 0.45, whilst spatial pattern similarity, as a supplementary metric, is assigned a lower weight of 0.10, to balance the evaluation contributions of localisation accuracy and spatial distribution consistency.

The evaluation results indicate that the model demonstrates excellent predictive performance on the test set. The model’s predictions of invasion intensity for the next 30, 90 and 180 days are shown in Figure 5. Figure 5a displays the distribution of the model’s predictions for invasion intensity over the next 30 days; the background heatmap transitions from yellow (low intensity) to deep red (high intensity), whilst the blue scatter points represent confirmed sightings. Predicted intensity is highest in the core area of north-western Washington State, with hotspots showing a high degree of overlap with areas of high historical sighting density, thereby validating the effectiveness of the spatial kernel function. Predicted intensity is lower in peripheral areas, reflecting the strong correlation between the spatial distribution of invasive species and historical sighting locations. Figure 5b displays the invasion intensity forecast for the next 90 days. Differences in intensity between different prediction time points are primarily driven by the seasonal modulation term

g (t)

, reflecting the periodic fluctuations in the activity of the invasive species. Figure 5c presents the long-term prediction results for the next 180 days. Due to the periodic modulation effect of the seasonal term

g (t)

, the overall intensity in the long-term prediction exhibits periodic variations, reflecting the temporal dynamics of the invasion intensity. The hotspot is observable across all three prediction time scales; these results provide a scientific basis for resource allocation and the formulation of monitoring strategies.

5.1. Robustness Analysis

To assess the model’s stability in the face of data perturbations and noise, a robustness analysis is conducted in this study. Specifically, Gaussian noise (

σ = 0.01 °

, approximately 1.1 km) is added to the longitude and latitude coordinates of the raw data to simulate positioning errors that may occur in actual monitoring, and changes in the model’s predictive performance are observed. As shown in Figure 6, Figure 6a displays the model’s prediction results based on the original data, whilst Figure 6b shows the results based on the data with added noise. Both figures present the predicted intensity distribution against a heatmap background, with predicted points marked by blue dots and red triangles, respectively. A comparative analysis indicates that the two models exhibit a high degree of consistency in the spatial patterns of the predicted intensity distribution, with the primary high-incidence areas concentrated in north-western Washington State and along the US-Canada border. It can be observed that the overall prediction results of the model show little change after noise is added, and the spatial distribution of the prediction points remains highly consistent, indicating that the model possesses good fault tolerance against data perturbations. Furthermore, this paper compares the performance differences between the original data and the noisy data across multiple evaluation metrics. The comprehensive accuracy score for the original data is 0.957, whilst that for the noisy data is 0.942, with a difference of only 0.015. In terms of capture rate, the capture rate for the original data at a 0.5° threshold is 98.74%, whilst that for the noisy data is 97.87%, representing a decrease of less than one percentage point. The K-coefficient falls from 0.9879 to 0.9863. These results demonstrate that the model maintains stable predictive performance when faced with positional errors within a reasonable range, thereby validating its robustness and reliability.

5.2. Interpretability Analysis

In the field of spatio-temporal modelling, model interpretability has long been a key focus of research aimed at enhancing model transparency [28,29]. As model complexity increases, the ability to interpret predictive outcomes becomes crucial. In this paper, each data feature is treated as a participant, and the model’s predictive strength is regarded as the outcome of a game; the marginal contribution of each feature to the final prediction is then calculated.

For each feature in all possible feature subsets, a SHAP score is assigned to represent that feature’s contribution to the prediction result. The specific calculation process in this study is as follows:

Step 1: For each sample, extract the raw data features (longitude, latitude, time).

Step 2: Use the model to predict the intensity values for both the original samples and perturbed samples in which certain features have been replaced with benchmark values.

Step 3: Calculate the difference between the predicted results for the perturbed samples and those for the original samples; this difference represents the marginal contribution of the modified feature.

Step 4: Calculate the weighted average for all possible feature subsets according to the Shapley value formula to obtain the final SHAP value for each feature.

The detailed results are shown in Figure 7. In Figure 7a, the x-axis represents the SHAP values, i.e., the magnitude and direction of the influence of each feature on the model output (predicted intensity). The y-axis displays the feature names (Longitude, Latitude, Time). The colours indicate the magnitude of the feature values, with a colour bar on the right: red indicates high feature values, whilst blue indicates low feature values. In Figure 7b, the x-axis represents the mean absolute SHAP values, and the y-axis displays the feature names.

The SHAP analysis reveals that longitude exhibited the highest average absolute SHAP value among all features, indicating it is the primary driver of intensity prediction. The impact of different longitude locations on predicted intensity varies significantly, correlating with the spatial distribution characteristics of sightings and validating the model’s effective capture of east–west diffusion patterns. Time ranks second, reflecting the direct influence of the time variable. Low time values (earlier observations) contribute positively, whilst high values exert negative influence, reflecting the time kernel’s decay characteristics that greater time intervals from historical events yield lower predictive strength. This aligns with the temporal proximity assumption and demonstrates effective capture of temporal clustering. Latitude ranks third, reflecting the influence of north–south spatial position; variations in latitude have a certain influence on predicted intensity, but their contribution is lower than that of longitude and time, potentially due to the relatively small latitudinal span where north–south differences in hornet activity are less pronounced than east–west variations. SHAP analysis provides interpretability for predictions. The kernel components

{(K_{s}, K}_{t}, h (s, t))

derive from these raw features; the model uses kernel functions to transform non-linear relationships into predicted intensity, with SHAP values quantifying each feature’s marginal contribution.

5.3. Model Comparison

To evaluate the performance of the proposed spatio-temporal interactive joint kernel Cox model, this paper conducts comparative experiments on the dataset provided by WSDA, comparing it with several benchmark models, including Inverse Distance Weighting (IDW), the Temporal Trend Model (TTM), the Poisson Point Process (PPP), the Gaussian Diffusion Model (GDM) and Complete Spatial Randomness (CSR). Fifty independent trials are conducted on the original dataset, and the results are averaged. Evaluation is performed using metrics such as the comprehensive accuracy score, capture rate, mean error and K-function correlation. The results are shown in Table 2.

On the original dataset, the STIK-Cox model achieves a comprehensive accuracy score of 0.957, significantly outperforming IDW (0.953), the TTM (0.865), the PPP (0.930), the GDM (0.926) and CSR (0.849). This indicates that the STIK-Cox model proposed in this paper performs better when handling the original data.

In terms of capture rate, STIK-Cox performs best at lower distance thresholds: the capture rate at 0.1° is 76.15%, the capture rate at 0.2° is 91.22%, and the capture rate at 0.5° reaches 98.74%. Although this is slightly lower than that of IDW (99.00%), it still demonstrates that the model is capable of accurately predicting the locations of sightings within a smaller spatial range. The average prediction error is only 0.0802°, significantly lower than that of other comparison models. The K-coefficient reaches as high as 0.9879, indicating that the predicted distribution is highly consistent with the actual distribution in terms of spatial patterns.

To verify the contribution of the spatio-temporal interaction term, this paper conducts ablation experiments comparing the performance of the full model, a model with the interaction term removed (

ρ = 0

), a model with the seasonal term removed, a model with only the spatial kernel, and a model with only the temporal kernel. The results are shown in Table 3.

Based on the results of the ablation experiments, it is clear that the STIK-Cox model outperforms other ablation variants across all metrics. The results indicate that the STIK-Cox model possesses a significant advantage in capturing the spatio-temporally coupled dynamics of invasive species; the contribution of its spatio-temporal interaction terms leads to a marked improvement in the comprehensive accuracy score, thereby validating the effectiveness of the non-separable design. The spatial kernel is a key component for prediction; models relying solely on temporal information exhibit a substantial decline in performance, which is consistent with the nest-centred spatial dispersal pattern of V. mandarinia. Furthermore, the seasonal term effectively captures the periodic patterns of species activity, contributing significantly to improved capture rates. The ablation experiments confirm that the synergistic interaction of all components in the complete model achieves optimal predictive performance, providing a reliable methodological foundation for early warning of invasive species dispersal.

6. Conclusions

This study proposes a spatio-temporal dynamic prediction framework for invasive species based on a spatio-temporal interactive joint kernel Cox model, and validates it using V. mandarinia as a case study. The model achieves a comprehensive accuracy score of 0.957, with a capture rate of 98.74% within a range of approximately 56 km and an average prediction error of only 0.0802°, outperforming conventional methods such as kernel density estimation and time-trend models. Through SHAP explainability analysis, it is revealed that longitude is the primary driver of predictive strength, with time and latitude characteristics ranking second and third, respectively, demonstrating the model’s effective capture of spatial location and temporal evolution trends. Robustness analysis indicates that the framework maintains high predictive performance even under Gaussian noise conditions, validating its stability in complex dynamic monitoring scenarios. Beyond prediction accuracy, the model offers practical implications for invasive species management. The spatio-temporal diffusion predictions at multiple time horizons (30, 90, and 180 days) provide spatially explicit guidance for prioritizing monitoring resources, with higher predicted intensity regions warranting increased surveillance effort. The seasonal modulation term

g (t)

informs the temporal allocation of management activities. The positive interaction parameter

ρ

indicates gradual diffusion from nest centers, consistent with targeted eradication at core sites and buffer zone establishment at diffusion fronts. These findings demonstrate the potential of the model as a decision-support tool, though application to real-world management scenarios would benefit from further validation across different invasive species and geographic contexts.

Although the application of non-separable spatio-temporal point processes in the management of invasive species is still in its infancy, the proposed framework holds great promise for tasks involving continuous spatio-temporal forecasting. For instance, in early warning systems, the model can be used to identify high-risk areas for targeted monitoring and resource allocation; in the field of ecological monitoring, the method can be employed to predict the diffusion trajectories of other invasive species with similar dispersal mechanisms. These applications are still in the exploratory phase, but they represent a new frontier in the field of ecological informatics, combining point process modelling with interpretability analysis.

Finally, although the model proposed in this study demonstrates high performance and excellent robustness in predicting the diffusion patterns of V. mandarinia, there is still room for improvement. The current model focuses on a specific geographical region and a single-species invasion scenario. Future research will be extended to multi-species and cross-regional datasets, and will further integrate environmental covariates such as climate data, land-use patterns and vegetation indices to enhance the model’s generalisation ability and ecological interpretability in practical applications. Furthermore, by combining more efficient computational algorithms with real-time data assimilation techniques to enable dynamic updates of model parameters, we aim to construct an adaptive and scalable prediction system. We believe these improvements will facilitate the application of this research to a broader range of invasive species management scenarios and advance the development of theories and techniques related to spatio-temporal statistics and ecological modelling.

Author Contributions

Conceptualization, Y.Z., J.W. and X.W.; Methodology, Y.Z., J.W. and X.W.; Formal analysis, Y.Z., J.W. and X.W.; Data curation, Y.L. and J.W.; Investigation, Y.Z., Y.L., S.W. and J.W.; Supervision, R.Y.; Writing—original draft, Y.Z. and Y.L.; Writing—review and editing, R.Y., J.W. and X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the “AI + Education” Course Project in Wannan Medical University in 2025, grant number 2025aikc02; Student Innovation Training Program in Anhui Province in 2024, grant number 202410368070.

Data Availability Statement

The original data presented in the study are openly available at https://www.comap.org/membership/member-resources/item/confirming-the-buzz-about-hornets (accessed on 28 December 2025).

Acknowledgments

We sincerely express our gratitude to all participants.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pimentel, D.; Zuniga, R.; Morrison, D. Update on the environmental and economic costs associated with alien-invasive species in the United States. Ecol. Econ. 2005, 52, 273–288. [Google Scholar] [CrossRef]
Simberloff, D.; Martin, J.L.; Genovesi, P.; Maris, V.; Wardle, D.A.; Aronson, J.; Courchamp, F.; Galil, B.; García-Berthou, E.; Pascal, M.; et al. Impacts of biological invasions: What’s what and the way forward. Trends Ecol. Evol. 2013, 28, 58–66. [Google Scholar] [CrossRef] [PubMed]
Seebens, H.; Blackburn, T.M.; Dyer, E.E.; Genovesi, P.; Hulme, P.E.; Jeschke, J.M.; Pagad, S.; Pyšek, P.; Van Kleunen, M.; Winter, M.; et al. Global rise in emerging alien species results from increased accessibility of new source pools. Proc. Natl. Acad. Sci. USA 2018, 115, E2264–E2273. [Google Scholar] [CrossRef] [PubMed]
Wilson, T.M.; Takahashi, J.; Spichiger, S.E.; Kim, I.; Van Westendorp, P. First reports of Vespa mandarinia (Hymenoptera: Vespidae) in North America represent two separate maternal lineages in Washington state, United States, and British Columbia, Canada. Ann. Entomol. Soc. Am. 2020, 113, 468–472. [Google Scholar] [CrossRef]
Zhu, G.; Gutierrez Illan, J.; Looney, C.; Crowder, D.W. Assessing the ecological niche and invasion potential of the Asian giant hornet. Proc. Natl. Acad. Sci. USA 2020, 117, 24646–24648. [Google Scholar] [CrossRef] [PubMed]
Alaniz, A.J.; Carvajal, M.A.; Vergara, P.M. Giants are coming? Predicting the potential diffusion and impacts of the giant Asian hornet (Vespa mandarinia, Hymenoptera: Vespidae) in the USA. Pest Manag. Sci. 2021, 77, 104–112. [Google Scholar] [CrossRef]
Matsuura, M.; Yamane, S. Biology of the Vespine Wasps; Springer: Berlin/Heidelberg, Germany, 1990; pp. xix+323. [Google Scholar]
Hooten, M.B.; Wikle, C.K. Statistical agent-based models for discrete spatio-temporal systems. J. Am. Stat. Assoc. 2010, 105, 236–248. [Google Scholar] [CrossRef]
Gallien, L.; Münkemüller, T.; Albert, C.H.; Boulangeat, I.; Thuiller, W. Predicting potential distributions of invasive species: Where to go from here? Divers. Distrib. 2010, 16, 331–342. [Google Scholar] [CrossRef]
Bröcker, J.; Smith, L.A. Scoring probabilistic forecasts: The importance of being proper. Weather. Forecast. 2007, 22, 382–388. [Google Scholar] [CrossRef]
Silverman, B.W. Density Estimation for Statistics and Data Analysis; Monographs on Statistics and Applied Probability; Chapman and Hall: London, UK, 1986. [Google Scholar]
Baddeley, A.; Rubak, E.; Turner, R. Spatial Point Patterns: Methodology and Applications with R; CRC Press: Boca Raton, FL, USA, 2016; Volume 1. [Google Scholar]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Lundberg, S.M.; Erion, G.; Chen, H.; DeGrave, A.; Prutkin, J.M.; Nair, B.; Katz, R.; Himmelfarb, J.; Bansal, N.; Lee, S.I. From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2020, 2, 56–67. [Google Scholar] [CrossRef] [PubMed]
Pawitan, Y. In All Likelihood: Statistical Modelling and Inference Using Likelihood, 2nd ed.; Oxford University Press: Oxford, UK, 2026. [Google Scholar]
Pyšek, P.; Richardson, D.M. Invasive species, environmental change and management, and health. Annu. Rev. Environ. Resour. 2010, 35, 25–55. [Google Scholar] [CrossRef]
Waagepetersen, R. Log Gaussian Cox processes. In Tagungsbericht 09/1998. Mathematische Stochastik; Aarhus University: Aarhus, Denmark, 1999; pp. 23–24. [Google Scholar]
Møller, J.; Díaz-Avalos, C. Structured spatio-temporal shot-noise Cox point process models, with a view to modelling forest fires. Scand. J. Stat. 2010, 37, 2–25. [Google Scholar] [CrossRef]
Moller, J.; Waagepetersen, R.P. Statistical Inference and Simulation for Spatial Point Processes; CRC Press: Boca Raton, FL, USA, 2003. [Google Scholar]
Casella, G.; Berger, R. Statistical Inference, 2nd ed.; Chapman and Hall/CRC: Boca Raton, FL, USA, 2024. [Google Scholar]
Guan, Y. A least-squares cross-validation bandwidth selection approach in pair correlation function estimations. Stat. Probab. Lett. 2007, 77, 1722–1729. [Google Scholar] [CrossRef]
Rathbun, S.L. Asymptotic properties of the maximum likelihood estimator for spatio-temporal point processes. J. Stat. Plan. Inference 1996, 51, 55–74. [Google Scholar] [CrossRef]
Ogata, Y. Estimators for stationary point processes. Ann. Inst. Stat. Math. 1978, 30, 243–261. [Google Scholar] [CrossRef]
Byrd, R.H.; Lu, P.; Nocedal, J.; Zhu, C. A limited memory algorithm for bound constrained optimization. SIAM J. Sci. Comput. 1995, 16, 1190–1208. [Google Scholar] [CrossRef]
Cressie, N.; Wikle, C.K. Statistics for Spatio-Temporal Data; John Wiley & Sons: Hoboken, NJ, USA, 2011. [Google Scholar]
Møller, J. Shot noise Cox processes. Adv. Appl. Probab. 2003, 35, 614–640. [Google Scholar] [CrossRef]
Mohamed, A.; Zhu, D.; Vu, W.; Elhoseiny, M.; Claudel, C. Social-implicit: Rethinking trajectory prediction evaluation and the effectiveness of implicit maximum likelihood estimation. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2022; pp. 463–479. [Google Scholar]
Adadi, A.; Berrada, M. Peeking inside the black-box: A survey on explainable artificial intelligence (XAI). IEEE Access 2018, 6, 52138–52160. [Google Scholar] [CrossRef]
Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; García, S.; Gil-López, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]

Figure 1. Block diagram of the STIK-Cox model. It illustrates the complete workflow of the STIK-Cox model, comprising the following main stages: (1) Data input and pre-processing: Cleaning and normalising the raw sighting data. (2) Calculation of kernel components: Calculating

K_{j o i n t}

,

g (t)

,

K_{s}

,

K_{t}

,

h (s, t)

,

λ_{0}

separately. (3) Intensity function calculation:

Q (s, t) = λ_{0} \times K_{j o i n t} \times g (t)

. (4) Parameter optimisation: The L-BFGS-B algorithm is employed to minimise the negative log-likelihood function, iteratively updating the parameter set. (5) Spatio-temporal point prediction: Based on the optimised intensity distribution, predicted locations for future sightings are generated using probabilistic sampling methods. (6) SHAP analysis: The Kernel SHAP method is employed to calculate the marginal contributions of each feature, outputting a ranking of feature importance to support the interpretability of the model’s predictions. (7) Result visualisation: Heatmaps of intensity distributions, maps of predicted locations, and SHAP feature importance plots are generated to intuitively illustrate the invasion diffusion patterns and the model’s interpretative results.

Figure 1. Block diagram of the STIK-Cox model. It illustrates the complete workflow of the STIK-Cox model, comprising the following main stages: (1) Data input and pre-processing: Cleaning and normalising the raw sighting data. (2) Calculation of kernel components: Calculating

K_{j o i n t}

,

g (t)

,

K_{s}

,

K_{t}

,

h (s, t)

,

λ_{0}

separately. (3) Intensity function calculation:

Q (s, t) = λ_{0} \times K_{j o i n t} \times g (t)

. (4) Parameter optimisation: The L-BFGS-B algorithm is employed to minimise the negative log-likelihood function, iteratively updating the parameter set. (5) Spatio-temporal point prediction: Based on the optimised intensity distribution, predicted locations for future sightings are generated using probabilistic sampling methods. (6) SHAP analysis: The Kernel SHAP method is employed to calculate the marginal contributions of each feature, outputting a ranking of feature importance to support the interpretability of the model’s predictions. (7) Result visualisation: Heatmaps of intensity distributions, maps of predicted locations, and SHAP feature importance plots are generated to intuitively illustrate the invasion diffusion patterns and the model’s interpretative results.

Figure 2. Spatio-temporal distribution of sighting events. Figure 2 consists of three sub-figures, illustrating the temporal distribution, spatial distribution and density characteristics of the sightings. (a) shows a bar chart of the temporal distribution of sightings. (b) shows a scatter plot of the spatial distribution of sightings. (c) shows a heatmap of the kernel density estimate for sightings.

Figure 3. Spatio-temporal clustering analysis. Figure 3 consists of three sub-figures. (a) shows the Ripley’s K-statistic analysis, where the

K_{a} (r)

values are significantly higher than the upper confidence limit of the CSR, indicating significant spatial clustering. (b) shows the L-function analysis plot, where

L (r) > 0

and exceeds the confidence upper limit, this indicates significant aggregation, with the red shaded areas marking the regions of significant aggregation. (c) shows the temporal K-function analysis plot, where the

K_{d} (t)

value is higher than the theoretical CTR value; this indicates significant temporal aggregation.

Figure 3. Spatio-temporal clustering analysis. Figure 3 consists of three sub-figures. (a) shows the Ripley’s K-statistic analysis, where the

K_{a} (r)

values are significantly higher than the upper confidence limit of the CSR, indicating significant spatial clustering. (b) shows the L-function analysis plot, where

L (r) > 0

and exceeds the confidence upper limit, this indicates significant aggregation, with the red shaded areas marking the regions of significant aggregation. (c) shows the temporal K-function analysis plot, where the

K_{d} (t)

value is higher than the theoretical CTR value; this indicates significant temporal aggregation.

Figure 4. Thinning-based residual analysis for model validation. (a) shows the spatial residual K-function

K_{g} (r)

, where the curve falls entirely within the 95% CSR envelope, indicating that the spatial residuals are consistent with complete spatial randomness. (b) shows the temporal residual K-function

K_{h} (t)

, where the curve similarly lies within the 95% CSR envelope, confirming that the temporal residuals approximate a homogeneous Poisson process. These results demonstrate that the fitted spatio-temporal Cox model adequately captures the clustering structure in the observed data.

Figure 4. Thinning-based residual analysis for model validation. (a) shows the spatial residual K-function

K_{g} (r)

, where the curve falls entirely within the 95% CSR envelope, indicating that the spatial residuals are consistent with complete spatial randomness. (b) shows the temporal residual K-function

K_{h} (t)

, where the curve similarly lies within the 95% CSR envelope, confirming that the temporal residuals approximate a homogeneous Poisson process. These results demonstrate that the fitted spatio-temporal Cox model adequately captures the clustering structure in the observed data.

Figure 5. Spatio-temporal diffusion prediction for V. mandarinia. This figure shows forecast distribution maps for three different forecast time scales: (a) T + 30 days, (b) T + 90 days, and (c) T + 180 days.

Figure 6. Model robustness analysis. (a) shows the prediction results based on the original data; (b) shows the prediction results based on data with Gaussian noise (

σ = 0.01 °

) added. The background shows a heatmap of the predicted intensity distribution.

Figure 6. Model robustness analysis. (a) shows the prediction results based on the original data; (b) shows the prediction results based on data with Gaussian noise (

σ = 0.01 °

) added. The background shows a heatmap of the predicted intensity distribution.

Figure 7. SHAP interpretability analysis. (a) A summary plot of SHAP values, showing the distribution of SHAP values for each feature, with colours indicating the magnitude of the values; (b) A bar chart of feature importance, showing the average absolute SHAP values for each feature.

Table 1. List of Parameters used in the article.

Parameters	Symbol	Physical Meaning	Unit
Baseline intensity	$λ_{0}$	Average event occurrence rate	events/(km²·day)
Spatial scale (longitude)	$σ_{x}$	Spatial influence range in longitudinal direction	degrees (°)
Spatial scale (latitude)	$σ_{y}$	Spatial influence range in latitudinal direction	degrees (°)
Temporal scale	$σ_{t}$	The decay rate of the temporal influence	dimensionless
Spatio-temporal interaction coefficient	$ρ$	Strength of spatio-temporal synergy	dimensionless
Interaction strength coefficient	$α$	Sensitivity of interaction term	dimensionless
Seasonal frequency	$ω$	Periodicity of seasonal activity	rad/day
Phase shift	$φ$	Temporal position of seasonal peaks	radians

Table 2. Performance comparison of different models, including STIK-Cox, IDW, TTM, PPP, GDM, and CSR, on the raw and Gaussian noise datasets, using capture rate at 0.1°, capture rate at 0.2°, capture rate at 0.5°, mean error, K correlation coefficient, and comprehensive accuracy score metrics.

Model	Capture Rate at 0.1°	Capture Rate at 0.2°	Capture Rate at 0.5°	Mean Error (°)	K Correlation Coefficient	Comprehensive Accuracy Score
STIK-Cox	76.15%	91.22%	98.74%	0.0802	0.9879	0.957
IDW	66.00%	88.34%	99.00%	0.1024	0.9535	0.953
TTM	25.34%	52.56%	95.23%	0.2221	0.9495	0.865
PPP	61.01%	75.12%	98.00%	0.1239	0.9830	0.930
GDM	49.23%	77.34%	97.67%	0.1404	0.9641	0.926
CSR	27.45%	58.67%	84.78%	0.2572	0.9491	0.849

Table 3. Results of ablation experiments comparing the complete STIK-Cox model against four variants: w/o Interaction term, w/o Seasonality term, Temporal Only, and Spatial Only, in terms of capture rate at 0.1°, capture rate at 0.2°, capture rate at 0.5°, mean error, K correlation coefficient, and comprehensive accuracy score performance metrics.

Model	Capture Rate at 0.1°	Capture Rate at 0.2°	Capture Rate at 0.5°	Mean Error (°)	K Correlation Coefficient	Comprehensive Accuracy Score
STIK-Cox	76.15%	91.22%	98.74%	0.0802	0.9879	0.957
w/o Interaction	62.57%	87.55%	96.48%	0.0979	0.9714	0.912
w/o Seasonality	73.38%	89.08%	97.83%	0.0826	0.9815	0.931
Spatial Only	73.25%	87.46%	97.50%	0.0865	0.9771	0.922
Temporal Only	28.72%	53.78%	91.69%	0.2316	0.9485	0.857

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, Y.; Li, Y.; Wang, S.; Wang, J.; Yasrab, R.; Wu, X. Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management. Algorithms 2026, 19, 408. https://doi.org/10.3390/a19050408

AMA Style

Zhang Y, Li Y, Wang S, Wang J, Yasrab R, Wu X. Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management. Algorithms. 2026; 19(5):408. https://doi.org/10.3390/a19050408

Chicago/Turabian Style

Zhang, Yantao, Yangyang Li, Shuxin Wang, Jingxuan Wang, Robail Yasrab, and Xinli Wu. 2026. "Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management" Algorithms 19, no. 5: 408. https://doi.org/10.3390/a19050408

APA Style

Zhang, Y., Li, Y., Wang, S., Wang, J., Yasrab, R., & Wu, X. (2026). Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management. Algorithms, 19(5), 408. https://doi.org/10.3390/a19050408

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interpretable Non-Separable Spatio-Temporal Interaction Cox Model for Diffusion Prediction in Invasive Species Management

Abstract

1. Introduction

2. Related Works

3. Methods

3.1. Model Parameters

3.2. Definition

3.3. Model Structure

4. Experiments

4.1. Software Implementation

4.2. Data Preparation and Preprocessing

4.3. The Model Training

5. Results

5.1. Robustness Analysis

5.2. Interpretability Analysis

5.3. Model Comparison

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI