1. Introduction
The spread of invasive pests, including non-native insects within urban treescapes, woodlands, and forests is having profound environmental, economic and social impacts [
1,
2,
3]. In the UK, invasive species are estimated to have cost GBP 5–13 billion since 1976 through damages and management costs, and impacts on ecosystem services [
4]. Thus, the UK government has identified enhancing biosecurity as a key priority, aiming to control existing pests and build resilience against emerging concerns by harnessing computational modelling methods [
5].
Statistical and mathematical computational models can be used to explore the fundamental behaviours of pest infestations and facilitate quantitative predictions of the future spread. Commonly employed models include dynamical system models consisting of differential equations that describe pest population numbers and movements across landscapes, such as those for the grey squirrel in Wales [
6]. Stochastic epidemic models [
7] are more commonly used to describe disease dynamics within a population, but are transferable to the application of invasive pest spread [
8]. In our previous work [
9], we demonstrated that such an approach was indeed transferable to exploring invasive pest dynamics, and we build upon that framework here.
A key invasive pest of concern for treescapes within the UK and northern mainland Europe is the oak processionary moth (OPM),
Thaumetopoea processionea (Linnaeus, 1758) (Lepidoptera: Notodontidae). OPM is a univoltine Lepidoptera that feeds on the
Quercus species [
10]. Female moths lay eggs in branches of the tree canopy in the summer, with larvae (caterpillars) emerging in the spring of the next year [
11,
12]. The larvae go through six instars, grouping together to form aggregates and constructing communal silk nests in the later instars [
12]. The larvae pupate in the nests with adult moths emerging in mid-July [
12,
13]. Native to southern Europe, OPM was first established in the UK in 2006 through an accidental import.
OPM is destructive to oak trees, causing defoliation, which can leave infested trees vulnerable to other stressors [
12]. Protecting the oak tree population is crucial for promoting biodiversity, with thousands of species known to be supported by the oak [
14,
15]. Additionally, OPM larvae have poisonous hairs, which contain a urticating toxin that is harmful to both human and animal health [
16,
17]. Despite great efforts to contain the UK infestation to the originally affected area of south-east England [
17], the extent of OPM continues to spread, with an expansion rate estimated at 1.7 km/year for 2006–2014, with an increase to 6 km/year from 2015 onward [
13]. There is evidence to suggest that the regions surrounding the current infestation area are particularly climatically suitable [
18] and, thus, the prediction and control of the OPM population at its outer extent are especially crucial. Previous models for OPM have included species distribution models to predict future infestation under climate change [
18] and electric network theory models to predict high-risk regions [
19].
In our previous work [
9], we considered the temporal population of OPM in two London parks (Bushy Park and Richmond Park), applying a novel Bayesian inference scheme to estimate the parameters for a compartmental epidemic model with a time varying infestation rate. This showed that the infestation rate in both parks remained relatively constant between 2013 and 2021, despite the control methods in place, resulting in the observed continual expansion of the infestation area [
13].
In this paper, we build upon the work in [
9], challenging the assumption that the infestations within the two parks are independent of each other, due to their neighbouring geographical locations. Thus, here we consider a two-node compartmental epidemic model, using analogous computational inference methods (making use of a linear Gaussian approximation to the stochastic susceptible-infected-removed (SIR) model and a Markov chain Monte Carlo scheme) to estimate the infestation parameters, including a quantification of each park’s influence upon its neighbour. The data, model, and inference scheme are detailed in
Section 2, with the results presented in
Section 3 and further discussion in
Section 4. Our findings demonstrate the applicability of a two-node compartmental model to describe OPM spread and provide a framework applicable to other partially observed time series for infestations.
2. Materials and Methods
In this section, we present the observational OPM data (
Section 2.1), detail the two-node SIR epidemic model (
Section 2.2) and outline the statistical methods used to estimate the model parameters (
Section 2.3 and
Section 2.4).
2.1. The Data
We consider the OPM infestation within two London parks: Bushy Park and Richmond Park. The data consist of the numbers and locations (eastings and northings) of removed OPM nests between 2013 and 2021. The data are collected and processed by The Royal Parks and shared with the Forestry Commission to inform the national OPM Control Programme. The University of Southampton (GeoData) provides analyses and support, and holds the data on behalf of the Forestry Commission.
A summary of the OPM nest presence (and immediate removal) in both parks is shown in
Figure 1a,b. We consider the cumulative time series of previously infested trees (those recorded as having nests removed) as our observed data in the following sections, shown in
Figure 1c. This corresponds to the ‘Removed’ category prevalence in the compartmental SIR model presented in the next section, as in [
9]. We refer to this as a partially observed dataset as we only have information about one of the categories in the compartmental SIR model. The neighbouring geographical location of the two parks, shown in
Figure 1d, motivates our choice to consider a two-node epidemic network model, outlined in the next section.
2.2. The Two-Node Model
In [
9], we applied a stochastic SIR epidemic model [
22] to the spread of OPM in Bushy Park and Richmond Park between 2013 and 2021. Here, we expand this model, noting that the two parks are close in geographical location, as shown in
Figure 1d and, thus, the OPM dynamics within each park may not be independent of each other. We therefore consider a similar stochastic SIR model but for two connected nodes (here representing each of the two parks) as illustrated in
Figure 2. Further mathematical details of the stochastic SIR epidemic model can be found in [
22,
23], with details of its formulation as a discrete-valued Markov jump process (MJP) in [
24].
Within each node, the fixed population of the tree transition between the compartmental states:
S, susceptible (not yet infested),
I, infested (currently infested), and
R removed (no longer infested or contributing to the infestation spread). The transition between the infested and removed state is governed by the removal rate parameter,
. The transition between the
S and
I compartments is governed not only by the standard infestation rate parameter
(the rate at which contact of one infested tree with one susceptible tree will result in infestation, referred to as the ‘effective contact rate’), but also an additional infestation ‘pressure’ from the neighbouring node, described by the parameters
, where
is the pressure applied by node 1 on node 2, and
the pressure from node 2 on node 1. A similar model has been previously proposed to describe national surveillance counts from the 2013 to 2015 West Africa Ebola outbreak [
25].
We assume that the effective contact rate and the removal rate are the same in both parks as parameters inherent to the OPM population under similar conditions. The effective contact rate can be expressed as , where is the number of contacts (opportunities for transmission) and is the transmissibility of the disease (here, the pest). Since is inherent to the system, the assumption of equal effective contact rates in each node results in and, thus, a number of contacts that is proportional to the population size within each node.
The dynamics of all compartment states in the two-node model above are most naturally described by a MJP, whereby state numbers are described via a continuous-time Markov process with a discrete state space, reflecting the fact that states change abruptly and discretely in time [
7]. As noted in [
26], this can be computationally prohibitive for models in which typical population sizes are more than a few hundred. Therefore, we eschew the MJP representation in favour of a tractable continuous approximation via a stochastic differential equation (SDE). We describe the SDE below before considering a further tractable approximation known in the stochastic kinetics literature as the linear noise approximation (LNA). We refer the reader to [
27] for further details on the SDE and LNA approximation of an MJP.
The corresponding stochastic differential equation model considers the latent process
, where
and
denote the number of trees in each of the compartments
S and
I in node
i at time
. The complete SDE model can be described by
where
is the state of the system at time
t,
is the vector of parameter values and
denotes uncorrelated standard Brownian motion processes on each of the compartmental states. The SDE drift function
and diffusion coefficient
are given by
and
where
and
and
is the
zero matrix. Since the SDE specified by (
1)–(
5) cannot be solved analytically, we replace the intractable analytic solution with a tractable Gaussian process approximation: the LNA, described in the next section.
2.3. Linear Noise Approximation
The LNA provides a tractable approximation to the SDE given in (
1)–(
5). We used the LNA in the same manner for a stochastic SIR model with a time-varying infestation rate, also applied to OPM, in [
9]. Formal details of the LNA can be found in [
28,
29,
30]; below we outline the derivation.
Consider a partition of
as
where
is a deterministic process satisfying the ordinary differential equation (ODE)
and
is a residual stochastic process. The residual process
satisfies
which will typically be intractable. The assumption that
is “small” motivates a Taylor series expansion of
and
about
, with retention of the first two terms in the expansion of
a and the first term in the expansion of
b. This gives an approximate residual process
satisfying
where
is the Jacobian matrix with (
i,
j)th element
Therefore, for the SIR model in (
1)–(
5) we have
Given an initial condition
, it can be shown that
is a Gaussian random variable [
31]. Consequently, the partition in (
6) with
replaced by
, and the initial conditions
and
give
where
satisfies (
7) and
satisfies
Further details on the derivation of (
9) are given in [
9]. Hence, the linear noise approximation is characterised by the Gaussian distribution in (
8), with mean and variance found by solving the ODE system given by (
7) and (
9), which can be solved numerically.
2.4. Bayesian Inference
We consider the case in which not all components of the stochastic epidemic model are observed and that the data points are subject to measurement error, as in [
9]. Observations (on a regular grid)
are assumed conditionally independent (given the latent process
) with conditional probability distribution obtained via the observation equation,
where
This choice of
P is due to the data consisting of observations on the removed states in each node,
and
, which for known population sizes
and
, is equivalent to observing the sums
and
. Our choice of observation model is motivated by a Gaussian approximation of two independent Poisson random variables with rates given by the components of
. Moreover, the assumption of a Gaussian observation model admits a tractable observed data likelihood function, when combined with the LNA (see
Section 2.3 and [
31,
32]) as a model for the latent epidemic process
. Details on a method for the efficient evaluation of this likelihood function can be found in [
9].
Given data
and upon ascribing a prior density
to the components of
, Bayesian inference proceeds via the joint posterior for the static parameters
and unobserved dynamic process
. We have that
where
is the observed data likelihood and
is the conditional posterior density of the latent dynamic process. We use a Markov chain Monte Carlo scheme for generating (dependent) samples from (
12) due to the intractable joint posterior. Briefly, this comprises two steps: i) the generation of samples
from the marginal parameter posterior
and ii) the generation of samples
by drawing from the conditional posterior
,
.
The parameters required as input for the inference scheme are given in
Table 1. We take estimates of the number of trees in each park and the infestation initial conditions as in [
9], with
(Bushy) and
(Richmond) and initial ODE conditions
.
The methodology described above overcomes the challenge of the data being partially observed and handles a relatively short observed time series (in this case, nine data points). This is transferable to other epidemic datasets. If the data considered is more limited, this could result in larger uncertainties in the plausible ranges of the estimated parameters.
3. Results
We take the data, detailed in
Section 2.1 and pictured in
Figure 1, for the cumulative number of trees with removed OPM nests in Bushy Park and Richmond Park. We (arbitrarily) denote Bushy Park as node 1, and Richmond Park as node 2, with the observed data corresponding to the removal prevalence time series
and
in the two-node model described in
Section 2.2. Through the inference techniques outlined in
Section 2.3 and
Section 2.4, we infer the parameters for the two-node stochastic epidemic model: the infestation rate
, and removal rate
, common to both nodes, along with additional parameters representing the infestation ‘pressure’ resulting from the neighbouring node,
and
, and an observation error
.
The results from the inference scheme are shown in
Figure 3 with within-sample median posterior series for
,
and
, and posterior densities of the inferred parameters. The average parameter results are shown in full in
Table 2. The median infestation rate is
, the median removal rate is
, and the median observation error (see (
10)) is
. The median posterior estimates of the parameters connecting the two nodes are
(Bushy to Richmond) and
(Richmond to Bushy).
We can consider each of the infestation components (see (
2)): the standard intra-park component
, and the connecting inter-park component
. Probability densities for both the intra- and inter-park infestation components using the median posterior estimates of
and
, and the full posterior distributions of
and
, are shown in
Figure 4. Similarly, the expected number of new infestations from each of the two components, averaged over 50 forward simulations with the median parameters estimated through the inference scheme (
Table 2), are shown in
Figure 5. Both
Figure 4 and
Figure 5 illustrate that in Bushy Park the inter-park dynamics (Richmond–Bushy) are significantly greater than the intra-park (Bushy–Bushy), whereas in Richmond Park the intra-park dynamics (Richmond–Richmond) are more significant than the inter-park (Bushy-Richmond). This suggests that the infestation in Bushy Park has been largely driven by the infestation in Richmond Park to a greater extent than vice versa.
Predictions of the spread of OPM are required to inform management strategies. We can use the inferred model parameters to simulate the infestation forwards in time. The simulated removal prevalence time series resulting from the stochastic two-node epidemic model with the median inferred parameter estimates is shown in
Figure 6 for the years 2013 to 2025. Here, we see the overall alignment with the observational data, with a deviation between 2017 and 2020 in which the infestation numbers were lower than this model predicts, and the future forecasting if the infestation were to continue with these characteristic parameters.
A possible contributing factor to the dynamics, not considered explicitly in this epidemic model, is the density of OPM nests within each park, i.e., the number of nests per infested tree, shown in
Figure 7. Richmond park has a slightly higher infestation density, with a median of three nests per infested tree, compared to the median of two nests per tree in Bushy Park, and an upper quartile of six nests per tree, compared to the upper quartile of four in Bushy Park. This increased nest density could contribute to the resulting infestation pressure from the Richmond Park infestation to the Bushy Park infestation.
4. Discussion
It is crucial to deepen our understanding of the dynamics of invasive pests to develop predictive modelling tools and maximise the impact of control strategies. Here, we show the applicability of two-node compartmental epidemic models, using case study data of the OPM infestations within the neighbouring Bushy Park and Richmond Park in London.
In our previous work exploring the UK OPM infestation [
9], we considered the two parks as independent contained areas and estimated the parameters for a compartmental model with a time-varying infestation rate, showing the infestation rate had remained stable over time. Here we challenge the assumption that the infestations within the two parks are independent due to their geographical proximity (with the closest park boundaries separated by approximately 2–3 km).
Instead, we assume that the infestation contact rate within each park,
, (assumed to be a constant based on the results from [
9]) and the removal rate,
, are inherent properties of the OPM species under similar conditions, resulting in identical
and
in both nodes (parks). Since
can be expressed as
, with
the number of contacts and
a transmissibility parameter inherent to the species, we have
and, thus, the number of contacts scales with the total population number. In the case of OPM, this corresponds to an increasing number of ‘contacts’ between trees due to the underlying movement of the OPM population, an assumption that would hold whilst the typical movement distances of OPM are of a similar (or greater) length scale to the tree population area, i.e., a greater number of trees (increased
N) means a greater number of opportunities for contacts (increased
) providing the moths can travel over the whole area. Despite the joint infestation parameters
and
, the infestation dynamics can still differ in each park due to different numbers of susceptible trees and the introduction of two additional parameters connecting the infestations within the two nodes,
and
.
Similar epidemic models have previously been used to describe the spread of infectious disease within human populations, e.g., for Ebola in [
25]. In these cases, the parameters connecting the nodes represent the rates of the movement of individuals between nodes, e.g., the movement of infected people between geographical locations. In the case of our OPM model, the individuals within each of the SIR compartments are trees, which are spatially fixed and, thus,
instead represents a proxy measure of the movement of the underlying OPM population between the two nodes. The movement patterns of OPM are not fully characterised, but can occur through three possible routes: short-distance larvae movement, flight of adult moths, and accidental human-mediated dispersal [
13,
33]. Considering the locations of the two parks considered here, the latter two dispersal mechanisms are possibilities for facilitating moth movement between nodes.
We find that the connecting parameter representing the infestation pressure on the trees in Bushy Park resulting from the infestation in Richmond park is much stronger than vice versa. The infestation in Richmond Park is relatively unaffected by the infestation in Bushy Park. One reason for the observed Richmond-led infestation dynamics could be the higher underlying nest density within the park, resulting in a greater population density of OPM per susceptible tree, a larger contact number,
, and more opportunities for longer-range movement between the two parks. The relationships between OPM nest density, tree height, infestation percentages, and bacterial control treatment are explored in [
34].
We note that there are control measures taking place in many OPM-infested areas [
35]. In Bushy and Richmond Parks, control measures include the yearly nest removal (leading to the data used here) and limited spraying with a biological insecticide which has been shown to reduce nest density [
34]. Thus, all inferred parameters represent the infestation dynamics under these conditions, rather than inherent parameters of uncontrolled pest spread. Although not ideal for learning about the fundamental properties of the species, this is necessary as emergent invasive pests require an immediate control response.
Future work could explore expanding the epidemic network to a greater number of areas (nodes), forming more connections across the wider OPM-infested area of south-east England. If a similar approach to network building was taken to that described in [
19], the results from the statistical compartmental epidemic model could be compared with the electric network theory model [
19]. However, expanding the network represents a computational challenge, with increasing numbers of parameters to infer.
The results from this work can inform the development of future computational models for the spread of OPM, and provide a statistical framework for applying to other emerging pest concerns with similar partially-observed temporal data sets.