Proof-of-Principle That Cellular Automata Can Be Used to Predict Infestation Risk by Reticulitermes grassei (Blattodea: Isoptera)

: Over the past few decades, species distribution modelling has been increasingly used to monitor invasive species. Studies herein propose to use Cellular Automata (CA), not only to model the distribution of a potentially invasive species but also to infer the potential of the method in risk prediction of Reticulitermes grassei infestation. The test area was mainland Portugal, for which an available presence-only dataset was used. This is a typical dataset type, resulting from either distribution studies or infestation reports. Subterranean termite urban distributions in Portugal from 1970 to 2001 were simulated, and the results were compared with known records from both 2001 (the publication date of the distribution models for R. grassei in Portugal) and 2020. The reported model was able to predict the widespread presence of R. grassei, showing its potential as a viable prediction tool for R. grassei infestation risk in wooden structures, providing the collection of appropriate variables. Such a robust simulation tool can prove to be highly valuable in the decision-making process concerning pest management.


Introduction
Given the importance of monitoring potential pest species and understanding their responses to different environmental conditions, species distribution modelling has assumed an increasingly significant role over the last few decades [1,2]. Species distribution models have attempted to disentangle species-environment relationship by using sites of known occurrence and non-occurrence, together with environmental variables over a wider study area [3]. However, data relating to true absence are rarely available, as the absence of a given species may not be clearly distinguished from a lack of record. Hence, presence-only datasets are commonly used in analytical biology [4].
In order to model these types of datasets, several statistical methods have been proposed [4,5]. For this study, Cellular Automata (CA) [6] was selected, because of its inherent simplicity and versatility, as it can simulate a variety of real-world systems, including biological and ecological ones. CA is defined as being an infinite set of finite automata spatially organized in cells. Each automaton can evolve between discrete states defined by transition functions that consider the states of the neighboring automata [7]. Despite having a relatively simple conceptual basis, CA can handle very complex ecological systems and, as a result, is popular for studying spatially extended dynamics [7].
Reticulitermes Holmgren is a Holarctic genus of subterranean termite indigenous to North America, the Mediterranean and Black Sea regions eastern Asia [8]. Both within their native range and invaded areas, several Reticulitermes species have pest status because they use wood and woody materials as their food source; their high economic impact makes them one of the major deterioration agents of wood in buildings [9].
Subterranean termites are social insects organized in a caste system (colony) where each cast plays a certain role: the workers, which look for food as part of their role of sustaining the colony; the soldiers, which mainly defend the colony from predators; the winged reproductives, whose mission is mainly the colony dispersion [10]. They require contact to soil and moisture for the colony to establish and live. They also have negative phototropism and, as a result, they usually live in galleries underground (or inside the wood), built by themselves [10]. This behavior makes it difficult to find and control them, as often an infestation is only detected at a relatively late stage.
The genus Reticulitermes Holmgren is suggested to be originally from the West European zone [11], although these termites are widespread globally, which is due to their invasive traits that increase the probability of creating viable propagules [12]. In Europe, seven Reticulitermes species have been described as native to the region, covering a Mediterranean strip ranging from the Iberian Peninsula to north-eastern Greece, Crete and Cyprus [9,[13][14][15][16][17][18]. Termite damage to wooden structures and agriculture products, in these areas, are increasing, and colonization in areas more northerly to their natural range, together with impacts of climate change are concerning. The highly invasive Reticulitermes flavipes (Kollar) is spreading its range in Europe [19] and has already been identified in the most westerly points of Europe, including the Azorean islands [8,20] and Tenerife, with their voracious presence in the latter creating havoc in 2019 [21].
In mainland Portugal, the only subterranean termite that has been identified to date belongs to the species Reticulitermes grassei Clément [9] which was, up until 2006, included in the complex Reticulitermes lucifugus. R. grassei originated from southwestern Europe (Iberian Peninsula and France) and was identified in Britain and in Faial Island of the Azores (2000) [20,22]. More recently, R. grassei has also been identified in Zurich, Swizerland, a country with no native termites [23]. The first written reports of termites in Portugal date back to the early 20th Century [24], but their presence and impact in buildings has since been the object of several different reports, including initial models of distribution [25]. The importance of this species in the conservation of historic buildings has also been reported, together with the last published distribution maps [26].
In this work, termite distribution in urban areas in Portugal from 1970 to 2001 (the most recent date for the publication of distribution models for R. grassei in Portugal [25]), was processed using Cellular Automata in a first approach to develop a simulation tool that can be used to help predict R. grassei infestation risk. At this stage, we do not aim to add extra data to those previously published, but to discuss the potential of the approach towards this future development of a truly infestation risk predictor.

Materials and Methods
An evaluation of termite distribution in the Portuguese mainland from 1970 to 2001 was undertaken to develop a simulation tool that can be used to predict R. grassei infestation risk.
To define the initial conditions of the simulation, an account of termite existence reports up until 1970 was carried out. This led to a representation of the Portuguese counties which have reports of termite occurrence prior to the defined starting date ( Figure 1a).
Consequently, using the Universal Transverse Mercator (UTM), as previously reported by Nobre and Nunes in 2001 [25], the Portuguese mainland territory was divided into 50 × 50 km cells (Figure 1b) so that a Cellular Automata could be developed and simulated throughout time. At this stage, the choice for 50 × 50 km cells was made taking into consideration the available supporting information and its resolution power (lower cell sizes, albeit potentially more accurate, lack data; bigger cell sizes lack the discriminatory powers of the variables considered). The Portuguese mainland landscape was represented and simulated as a 11 × 6 grid. This led to the identification of the cells in which termite presence could be considered, according to the counties these cells encompass ( Figure 2). In addition, the cells in which no Portuguese territory existed were programmed to never have termites (the probability of a transition from not having termites to having termites was considered 0). This adjustment made the area considered for the simulation 112.500 km 2 , a much closer value to the real 92.212 km 2 , when comparing to the 165.000 km 2 considered prior to this adjustment. Markov chains were considered for each cell, allowing the creation of a stochastic model to predict the cells' next state. The transition matrix of each cell was different, as the conditions of the encompassed territory changed.
For each cell, two different states were defined: (1) Termites are present; (2) Termites are not present; the transition between the two was dependent on both the cells characteristics, i.e., human population (HP), insolation (IS), days of rainfall in a year (DR) and leptosols (LS), and the presence/absence of termites in the surrounding cells. The elements taken into consideration to analyze the cells' probability to host termites were the ones that Nobre and Nunes [25] found as the most relevant to calculate the probability of the presence of R. grassei, which is given by: where z is given by: However, in order for termite presence in the surrounding cells to be incorporated into the probability of R. grassei presence, the following formula was postulated: with NE being a neighborhood termite presence and possibly exhibiting two different potential values (0 or 1, depending on whether termites exist in at least one of the neighboring cells, while considering a Moore neighborhood, i.e., the 8 surrounding cells around a central square on a two-dimensional lattice).  To calculate z, the values of HP, LS, IS, and DR must be defined in classes, as shown in Tables 1, 2, 3, and 4, respectively. The classification of each cell that encompassed more than one county, regarding the LS, IS, and DR, was done according to class prevalence, meaning that the class given to a cell was the one which represented most of the counties in the given cell. For HP, the total inhabitants of the counties in a given cell were calculated, and the class given to that cell was done accordingly.  After class fitting, the probabilities for each cell were calculated and used in the simulations. The simulation tool was developed in Python 3.8 [27], and the modules used for the simulations were Pandas [28] and NumPy [29].
Additionally, a simple map with the R. grassei presence, as reported up to 2020 on a termite occurrence data base maintained by the National Laboratory for Civil Engineering (LNEC), was constructed, to further evaluate our predictions. The rationality behind this is based on the reported observation of termite occurrence being linked to the infestation of wood in buildings, which implies a delay of the reporting process and results in a cumulative dataset where current reported incidences likely reflect events that initiated years before.

Results
The 31-year span simulation resulted in the termite distribution throughout mainland Portugal presented in Figure 3. According to the simulation, termite distribution in the Portuguese mainland should have been almost complete, with termites being expected to exist in all but three counties of the Portuguese mainland territory (Figure 3). The lower probability for termite infestation to occur in the three north-east counties resulted from the prevalent landscape conditions combined with the reduced number of neighboring cells' a consequence to it being in the extremities of the simulated model.

Discussion
The model herein predicted the widespread risk of presence of R. grassei, with the exception of a single cell. Interestingly, this cell corresponded to the lowest population density (the cells class for HP was 6). Furthermore, the region showed a relatively large area of leptosols, characterized by either a very shallow soil over hard rock or a deeper soil that was extremely gravelly and/or stony (leptosols class 3). Either way, these are soils that theoretically hinder subterranean nesting and foraging activities, and hence lead to a decrease in that cell's probability to host subterranean termites.
Nevertheless, as Nobre and Nunes [25] mentioned, there is a clear association between the data used and human variables, meaning that the model (and, consequently, the simulation tool which was based on it) is not supposed to be used to address the question of natural termite distribution nor of habitat selection, but only of the association between human societies development and termite infestation of buildings.
Based on the previously found relevant ecological variables to R. grassei distribution [25], the developed model predicted a more widespread presence of the termites than previously reported in 2001. This is, however, not surprising and can be described by two factor classifications: Inherent to the CA Approach (a) The classification of each cell for LS, IS, and DR was undertaken according to the most frequent class present in the totality of the region. However, if smaller cells had been considered, the classes of each zone might have been different, which would have led to larger, or smaller, probabilities for termite infestation; (b) Relatively large cells were considered; thus, the HP class of each cell was invariably high (larger cells comprehend more counties and, hence, more people), meaning that the class given for a specific cell was, most of the times, larger than 8. This had a very significant impact as, according to Nobre and Nunes [25], HP weight in the probability for R. grassei existence increases exponentially with class increase; (c) Termite presence in the neighboring cells was postulated to double the probability of termite establishment in each cell. However, this value was not mathematically estimated, meaning that the real predicted influence on the R. grassei probability could possibly be different (and vary with the number of surrounding cells with termite pest infestation, something that was also not considered in the developed simulation tool).

Inherent to Data Collection and Variables Used
(a) Almost every reported observation of termites was performed due to an infestation of wood in buildings. This means that there could be many other locations where termites existed (and were deteriorating agents of wood in buildings) but were not reported to date. This would suggest that, as years go by and the number of reports increase, the number of affected counties should also increase. This is exactly what has happened and, nowadays, R. grassei distribution in applied wood in mainland Portugal covers a much larger area (as can be seen in Figure 4b). (b) The variables used were ecological variables only, whereas the presence data were almost exclusively from infestations reports. No variables considering timeframe and infestation dynamics as well as on the likelihood of infestation report and time since first occurrence were considered.
The potential of the approach is clear, albeit the indicated need of improvement towards a better fitting infestation model. For designing such a model, it is necessary to include meaningful variables related to building materials, building physics, infestation dynamics, and human infestation tolerance threshold. To achieve this, a stepwise approach similar to that undertaken with the ecological variables is suggested. Once the significant variables are identified, the CA approach could be applied towards a more robust R. grassei infestation risk map.
Enhancement of prediction capability could also be achieved with a more refined size cell. As such, it would be relevant to simulate the Portuguese mainland landscape with progressively but feasible smaller cells to select the scale in which no more significant predictability improvement could be reached. Needless to say, the simulation tool should be updated with a new and revised mathematical model for the probability of R. grassei existence that includes environmental and ecological variables and building and sociological ones. This should be done alongside the incorporation of a term that considers the number of neighboring cells where termites are present to calculate the likelihood of termite presence at a given time point. Finally, it would be interesting to also add a graphical interface for a better visualization of the termite population in the considered grid throughout time. Once such a robust infestation risk model is obtained, similar postulations considering different climate change scenarios may be determined, thus increasing its value as a termite pest management tool.

Conclusions
The possibility of predicting the infestation risk by R. grassei using CA has been shown. Such a simulation tool, once optimized, will be useful in the control of termite infestation. The prediction of an event before it happens is an extremely valuable tool in decision making within a sustainable pest management framework. However, there are still several improvements that need to be undertaken for the developed simulation model to be considered as a viable prediction tool for R. grassei infestation risk map.