Passive Acoustic Monitoring as a Tool to Investigate the Spatial Distribution of Invasive Alien Species

: Invasive alien species (IAS) are a threat to biodiversity and ecosystem function worldwide. Unfortunately, researchers, agencies, and other management groups face the unresolved challenge of effectively detecting and monitoring IAS at large spatial and temporal scales. To improve the detection of soniferous IAS, we introduced a pipeline for large-scale passive acoustic monitoring (PAM). Our main goal was to illustrate how PAM can be used to rapidly provide baseline information on soniferous IAS. To that aim, we collected acoustic data across Puerto Rico from March to June 2021 and used single-species occupancy models to investigate species distribution of species in the archipelago and to assess the peak of vocal activity. Overall, we detected 16 IAS (10 birds, 3 mammals, and 3 frogs) and 79 native species in an extensive data set with 1,773,287 1-minute recordings. Avian activity peaked early in the morning (between 5 a.m. and 7 a.m.), while amphibians peaked between 1 a.m. and 5 a.m. Occupancy probability for IAS in Puerto Rico ranged from 0.002 to 0.67. In general, elevation and forest cover older than 54 years were negatively associated with IAS occupancy, corroborating our expectation that IAS occurrence is related to high levels of human disturbance and present higher occupancy probabilities in places characterized by more intense human activities. The work presented here demonstrates that PAM is a workable solution for monitoring vocally active IAS over a large area and provides a reproducible workflow that can be extended to allow for continued monitoring over longer timeframes.


Introduction
Invasive alien species (IAS) are one of the largest threats to biodiversity around the world [1]. It is estimated that the global costs of damage caused by IAS exceed hundreds of billions of U.S. dollars a year [2,3] and that IAS have negatively affected more than a hundred critically endangered native terrestrial vertebrates [4]. The number of established IAS has exponentially increased during the last century for different biological groups [5], and climate change is expected to further expand the distribution of some invasive species [6]. After reaching and becoming established in a new area, IAS can impact local biodiversity through direct and indirect negative interactions with native taxa, such as predation, competition, disease spread, predator poisoning, and altering habitat characteristics [2]. The impact of introducing an alien species can be even more dramatic for native species on islands, which may experience rapid extirpation of native fauna after the arrival and establishment of an IAS [7][8][9]. Insular species often have small populations and home ranges, low genetic diversity, and lack morphological adaptations against IAS [7]. Islands in warmer regions of the globe are hotspots for IAS and tend to have more established invasive species than mainland areas, generating a profound concern for the conservation of native species on tropical islands [5,8].
One of the main obstacles that wildlife managers and conservationists face in opposing the threat of invasive species is acquiring rapid, reliable, large-scale baseline information on the distribution of fauna-data which are critical in guiding effective wildlife management programs [10]. A central challenge to this type of monitoring is how to cover a large sampling area with a limited number of researchers (an issue faced by many projects and agencies) and still complete surveys in a short period of time, which has the dual benefit of avoiding temporal bias and providing distribution data quickly. Moreover, the success of IAS control or eradication efforts is typically greater when species are detected in an early stage before they become established in a new location [11]. Therefore, the development of early-stage detection systems is urgent in order to identify IAS before they can become a significant threat [12][13][14]. The rapid evolution of remote sensing tools has made it possible to conduct large-scale monitoring in a short period of time, providing detailed baseline information of ecosystems and biodiversity [15].
Remote sensing through satellite and airborne images has been successfully used to detect and monitor changes in forest cover, vegetation type, disturbance regimes, plant phenology, and ecosystems [16][17][18][19][20]. However, remote sensing is still mostly overlooked for invasive fauna, though camera traps and autonomous acoustic devices have been successfully used to detect and monitor alien animal species [15,21,22]. Other noninvasive methods such as eDNA have also been used recently to detect IAS [21,22]. The emergence of new autonomous recording units (ARUs) and platforms to store and analyze massive amounts of audio data has greatly improved the utility of passive acoustic monitoring (PAM) to monitor soniferous wildlife and its threats [23][24][25][26][27][28]. Taxa that regularly produce species-specific vocalizations, such as birds, anurans, bats, insects, and some mammals, are well suited for ecoacoustic surveys [29,30]. Although PAM has great potential for monitoring biodiversity and researching a variety of ecological issues [31], it has not been used yet to its maximum potential for investigation of soniferous terrestrial IAS [32,33]. However, PAM has been used for investigating the occurrence of invasive freshwater drum in the New York State Canal System and assessing the phenology of the invasive cane toad in Australia [34,35].
Passive acoustic monitoring has numerous benefits for rapid assessment and early detection of sound-producing fauna, as well as large-scale and long-term monitoring, and can decrease the response time for managing soniferous IAS [28,31,33,36]. Amongst the primary benefits, PAM is a standardized noninvasive survey method that can be used simultaneously in numerous locations, allowing for the monitoring of hundreds to thousands of sites at the same time, which would otherwise be impossible if trained researchers were required to be in the field at each location [36]. Additionally, PAM facilitates sampling during all periods of the day, covering the peak of activity of different taxa; devices can be easily deployed with little or no specialized training; and recordings can be permanently stored, providing insights on temporal patterns of biodiversity [36,37].
Passive acoustic monitoring generates the detection and non-detection data required for species distribution models, one of the most used tools for IAS risk assessment [38][39][40][41][42]. Due to the substantial volume of data collected by the ARU in the field, one constraint on PAM has been the ability to examine all of the recordings collected. As a result, the development of protocols that optimize inspection of recordings together with semiautomatic or automatic techniques that speed up data analysis is necessary to extract the maximum amount of information from the acoustic data collected. In addition, to expedite biological and ecological insights from data acquired through PAM, it is important to develop user interface tools that allow availability, visibility, and management of results for the nonacademic public (e.g., citizen scientists, birdwatchers, practitioners, and wildlife managers).
Here, we used a large-scale PAM survey across the Puerto Rican archipelago (841 sampling sites) to investigate the spatial distribution and peak of vocal activity of IAS.
Our main goal was to generate baseline population data for soniferous native and IAS in Puerto Rico. Moreover, we provide a roadmap on how remote soundscape data collection can be used to rapidly provide distribution information for soniferous species through a free, user-friendly web interface designed to be accessible to diverse stakeholder groups as biologists, ecologists, wildlife managers, and citizen scientists. Our workflow offers a balance between manual inspection of recordings with semiautomatic analysis, which considerably reduces the time required to analyze the large amount of data typically generated by PAM, and providing assessment for the peak activity of species and detection/nondetection data for each day and site sampled. To assess the status of IAS populations in Puerto Rico, we used Bayesian single-species occupancy models to investigate how the spatial distribution of IAS varied through environmental gradients. The last step of our end-to-end pipeline for large-scale passive acoustic monitoring consisted in the development of a web page connecting and summarizing data from the Arbimon platform to display and share the results of the ecoacoustic analyses and close the gaps between academia and wildlife managers and decision makers. We expect that our study can provide baseline information on soniferous IAS for wildlife managers and decision makers in Puerto Rico and support future research focused on the rapid assessment or long-term monitoring of sound-producing wildlife across a broad spatial area using acoustic survey methods.

Passive Acoustic Monitoring
The Caribbean is a hotspot of biodiversity [43]; unfortunately, the islands in the region are also hotspots of established IAS [8]. Amongst them, Puerto Rico has a long history of IAS, which raises concerns about the prior and future impacts of those species on the native and endemic flora and fauna [44,45]. There is a gap in the knowledge of population assessment of IAS in the archipelago, primarily for birds and frogs. We conducted a large-scale PAM survey in the Puerto Rican archipelago, including the main island of Puerto Rico and the major offshore islands of Culebra, Desecheo, Mona, and Vieques (Appendix A). Overall, we successfully deployed ARUs at 841 sites: 198 sites in and around six preselected regions in the interior of Puerto Rico and 643 sites distributed in 14 regions across coastal areas in the main island of Puerto Rico (n = 485 ARU), Culebra (n = 49), Desecheo (n = 7), Mona (n = 52), and Vieques (n = 50). The selection of sampling sites within each region and protected area followed an approximate systematic sampling design to capture the full range of conditions found there.
We made use of lightweight and low-cost Open Acoustic Devices AudioMoth recorders to record the soundscapes across Puerto Rico [26,36]. We used the Open Acoustic Devices AudioMoth Configuration App to program the audio devices to record 1 min every 5 min with mean gain and at a 48 kHz sampling rate. Four teams of two field agents deployed AudioMoths between March 1 and June 6, 2021. The ARUs were placed in a protective waterproof plastic case and then affixed to a tree/vine/shrub using wooden clothespins at a height of 1.5 m (Appendix A). A total of approximately 200 AudioMoths were simultaneously in the field at any given time during the survey period; the AudioMoths were rotated across sampling sites after about 1 week of data collection, with ARUs simultaneously in the north, south, east, and west of the island at all times. The AudioMoths recorded in the field for at least seven consecutive days; thereafter, field agents revisited each site to retrieve the ARUs that had been deployed and move them to a new location. Due to logistical issues and the occasional malfunction of the devices, the number of days recorded at a site ranged from 1 to 11 (mean = 8.3, sd = 1.11). The main factors contributing to data loss or site failure were device malfunction (n = 10) and theft (n = 4).
We made an effort to deploy the ARUs no less than 200 m apart from each other, which was an adequate distance to ensure that a calling event of most species was not recorded by more than one device simultaneously (the great majority of the distances between nearest points were above 500 m; mean = 865.03 m, sd = 714.05). Moreover, we expected that 200 m was a good proxy of the home range of most low-vagility forest birds and frogs, ensuring site independence for these species. Field agents manually documented the date, site ID, recorder ID, latitude, longitude, and additional relevant climate and environmental factors. The Rainforest Connection (RFCx) Companion App was also used during AudioMoth deployments to sync the ARUs and add site metadata (e.g., GPS coordinates, elevation, photos, and notes) with sites in the RFCx Arbimon web platform (https://apps.apple.com/us/app/rainforest-connection/id1178078181 (accessed on 1 March 2021)). Once the ARUs were retrieved, all recordings were uploaded to RFCx Arbimon web-based platform, where all the sound files are now permanently stored (https://arbimon.rfcx.org/project/puerto-rico-island-wide/dashboard (accessed on 5 July 2021)).

Semiautomatic Species Identification Workflow
Large-scale and long-term investigations using ARUs commonly generate a colossal number of recordings [27], quantities that quickly become unworkable to examine through manual approaches. We developed a protocol for acquiring species detections from audio recordings using a semiautomated procedure that requires a relatively small time investment from researchers to produce a preliminary list of species occurring in a study area and provides a validation data set [46][47][48][49] (Figure 1). RFCx Arbimon is a free, cloud-based, user-friendly acoustic analysis platform for storage, management, visualization, and analysis of ecoacoustic data. Using the spectrograms generated by Arbimon, two of the coauthors with extensive experience in audio surveys (G.A.L. and T.N.M.) manually inspected all 1-minute recordings from two nonconsecutive days at each site (576 1minute recordings per site were inspected) to search for species' sound signals and noted if each species call was present in each record [49]. During this process, hereafter referred to as "manual annotation", animal sounds with a high signal-to-noise ratio were identified and selected as the templates for template-matching analyses, known as Pattern Matching (PM) analysis in the RFCx Arbimon system [49] (Figure 1). During these steps we identified and tagged other characteristics and sound sources such as the quality of recordings, noises caused by internal malfunction of AudioMoth devices, external interference, anthrophony (e.g., noise of machines, vehicles, and voices), and domestic animals. This manual annotation of the data was an important step in providing a consistent preliminary list of species recorded by ARUs throughout the study area, selecting good templates for PM analysis, and generating validated data sets [23,46,49].
The preliminary species list created by the manual annotation of a subset of recordings provided the templates that were subsequently used to select species of greatest conservation need (SGCN; https://www1.usgs.gov/csas/swap (accessed on 30 September 2021)), native and endemic species and all IAS detected for the ecological analyses. We ran a PM analysis for a total of 65 species in the RFCx Arbimon platform (37 SGCN, 16 IAS and 12 extra species). Using available audio libraries (e.g., Xeno-Canto) we also created PMs for other SGCN and IAS that are known to occur in Puerto Rico but were not detected during the manual validation or that did not have acceptable templates. Use of external templates did not result in the detection of any new species. Pattern Matching, a template-based detection algorithm, take a sound template and an audio file playlist as inputs and searches a selected playlist for signals that correlated with the template in the spectrogram domain [46]. All audio segments with a correlation equal to or greater than a user-chosen threshold were detected as regions of interest (ROI) and displayed in a graphical user interface (GUI) in a grid format. We assigned a correlation threshold equal to 0.2 and ran the models on all sites included in the playlists. The selection of a low threshold resulted in a high number of false positives, though the number of false negatives was assumed to be negligible. M.C.C., G.L., and T.M. manually reviewed the template-matching results. In this step, we annotated the results as either positive or negative to indicate the corresponding species' presence or absence, respectively. This ensured that the final data set used in the analyses only included expert-verified detections and excluded all false-positive detections.  [50]). During this process, important environmental variables should be gathered to assist with survey design; for example, spatial data can provide the main habitat types and most appropriate locations for the deployment of the autonomous recording units (ARUs; e.g., land cover, elevation, watercourses, trails, and roads).
Step 2-Set temporal PAM designs depending on the goals of the project and deployment of ARUs in the field. The literature and pilot studies can assist in more effective adjustments to sample target species.
Step 3-The large volume of recordings collected during large-scale and long-term studies are processed via a software or web platform to store, visualize, manage, and analyze the audio recordings efficiently.
Step 4-A subset of recordings is manually inspected (i.e., "manual annotations") to search for target acoustic signals with a high signal-to-noise ratio and select the templates for further use in an Arbimon Pattern Matching analysis (PM), also known as Template Matching analysis.
Step 5-In the semiautomatic approach, the PM analysis is a crucial step in searching for and detecting target acoustic signals through all recordings in a playlist. There is a trade-off in choosing the PM threshold: a low threshold returns a high number of false positives while reducing the possibility of false negatives. We chose to use a low PM threshold and manually validated a subset of PM per site (i.e., best-scored ROI per site per day). This process enhanced the chances of detecting at least one call of the species at the site and aided in the removal of false positives. In addition to producing useful ecological data per se, the PM provides training data for artificial intelligence models (e.g., a convolutional neural network [46]). Training data can also be acquired through automatic event detection of acoustic signals through audio event detection analyses that can be clustered to identify groups with similar sound characteristics. Even for this automatic identification process, manual post validation of a subset of recordings is necessary.
Step 6-The last step is to summarize all information and make the ecological information available to decision makers. This step can be the final point of PAM or can provide a feedback loop for new questions to be investigated.
We created two playlists on which to run the PM: a diurnal playlist with all recordings from 5 a.m. to 6 p.m. and a nocturnal playlist with all recordings from 6 p.m. to 5 a.m. The diurnal playlist was applied to 41 bird species and three mammal species; the nocturnal playlist was applied to 18 frog species and three bird species. The resulting ROIs from the PM analysis were manually inspected in a grid view and validated as true or false positives by experts. For validation, we filtered the results to only show the best-scored ROI per site and day to acquire detection and nondetection data to be used in the occupancy models ( Figure 1). The PM analysis of all species detected using all recordings together with the manual validation of the best-scored ROI per site and day took two ornithologists (GAL and TNM) approximately 15 days. Using these validations, we created a table of sampling sites by sampling days for each species (i.e., 1 if the species was detected, otherwise 0). Since our main objective here was not to investigate the daily activity of these species but to maximize detection in each sampling site, we chose the period with the greatest volume of soniferous data for the birds and frogs based on our experience working with the birds and frogs of Puerto Rico. Thus, we highlight that the polar plots for birds and mammals were biased for the diurnal period while those for frogs and nocturnal birds were biased for the nocturnal period. However, if the research goals involve investigating the daily activities of species, this approach can be easily adjusted to incorporate both periods of day for each species.
In this paper, we used the standard biological sense adopted by Simberloff (2013) [2] that defines invasive species as "species that arrive with human assistance, establish populations, and spread", contrary to the usage adopted by policymakers who assume that IAS are only introduced species that cause some proven negative impact on diversity or an ecosystem, although there is evidence suggesting that most of these IAS may be threats to Puerto Rico's biodiversity.

Explanatory Variables
We collected environmental data from multiple online repositories administered by the Multi-Resolution Land Characteristics Consortium, National Oceanic and Atmospheric Administration, Open Street Map, USDA Caribbean Climate Hub, and USGS EarthExplorer. Data gathered included GIS layers of elevation, climate, forest age, landuse type, protected area extent, and roads and trails. For each sampling location, a buffer was added with a radius of 200 m around the site (area = 12.57 ha). Data for inclusion as environmental covariates in the ecological analyses were extracted for the 200 m site buffers using the Zonal Statistics and Tabulate Area tools in ESRI ArcMap (version 10.8; Environmental Systems Research Institute, Inc., Redlands, CA, USA).
We selected 13 explanatory variables that have been reported to have an effect on fauna from the GIS layers a priori [38,48,51,52]. We checked multicollinearity among the explanatory variables used in the occupancy models through the variance inflation factor (VIF) function of the "usdm" package [53] in R version 3.6 [54]. This function progressively excludes collinear covariates through a stepwise procedure. VIFs were calculated using two methods: VIFcor (threshold = 0.7) and VIFstep (Threshold = 10). The following variables with low collinearity were used in occupancy models: elevation, mean annual precipitation, percent of forested area between 34-54 years old, percent of forested area older than 54 years, percent of area within a protected area boundary, percent of canopy cover, distance from roads, and proportion of built-up area (Appendix B).
We used the Generate Tessellation tool in ArcMap to create a hexagon grid covering the entire extension of Puerto Rico and major offshore islands. The hexagon grid had a cell area equal to that of the 200 m buffer applied to survey sites. We exported values for the explanatory covariates for each hexagon, which were used to make maps of predicted occupancy for each species using the top-ranked model.

Single-Species Occupancy Model
Since the IAS may have experienced different introduction histories on the islands and have different capabilities to colonize new islands, we chose to model species occupancy only for the islands where the species was detected in at least five sampling sites. We investigated occupancy while accounting for imperfect detection of invasive species by fitting a hierarchical single-season occupancy model [55][56][57] in a Bayesian framework using the "ubms" package in R [58,59]. The occupancy model made use of repeated observations in the sites to disentangle the observational component (i.e., detection/nondetection) from the state variable of interest (i.e., "true" occupancy [38,55]). The model had four central assumptions: (i) independence between sampling sites, (ii) independence between repeated observation occasions, (iii) absence of misidentification of the focal species (no false positive error), and (iv) no colonizations or extinctions during the study period; that is, the sampling sites were expected to be "closed" to the occupancy state of species during the study period [55,60,61]. Assumption of a closed occupancy state may be relaxed if the changes in the occupancy state are random, and therefore the occupancy parameter should be interpreted as the probability that the species "uses" the habitat [62].
We applied a sequential approach to identify the most parsimonious model for each species rather than run all possible model combinations [63]. Firstly, we held the occupancy probability constant (ψ(.)) and fit four a priori candidate models for detection probability: null model, elevation only, linear and quadratic effect of Julian day, and full model (Table 1). Then, we held the top-ranked detection model ( (TM)) and modeled a set of a priori candidate models for the occupancy probability of species (Table 1). Since the availability of the spatial data containing the predictor variables varied across islands, the number of candidate models for the species also varied depending on the island on which the species was detected ( Table 1). The parameters of both detection and occupancy were modeled as a logit function [55,59]. We normalized all continuous explanatory variables to have a mean of 0 and a standard deviation of 1 [61].
Even while expecting that 200 m was a sufficient distance to guarantee sampling-site independence, we made the decision to run a spatial occupancy model to account for spatial autocorrelation in the occupancy probability, since including spatial autocorrelation in occupancy modeling can improve species prediction and distribution maps [56,59,64]. We modeled the spatial occupancy model using restricted spatial regression (RSR) that dealt with the random effect uncorrelated with the fixed covariates [64]. We considered that two sites were neighbors if the distance between them was equal to or below 1000 m (we set the threshold at 1000). RSR has been shown to be a good choice for modeling occurrences while accounting for imperfect detection and spatial autocorrelation [59,64]. Due to the long time required to run models with spatial autocorrelation, we chose only to fit RSR with the top-ranked nonspatial occupancy model to determine if incorporating spatial autocorrelation improved the model fit. Table 1. A priori candidate models fitted in the occupancy analyses for the main island of Puerto Rico and Culebra, Mona, and Vieques. TM = top model; precip = precipitation (cm); PA = proportion of protected area; canopy = proportion of canopy cover; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date 2 = square of Julian date; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover older than 54 years; dist_road = distance from roads (m); built_up = built-up proportion; RSR = restricted spatial regression. We considered a distance threshold of 1000 m for two sites to be considered a neighborhood in the spatial occupancy model.

Detection Models
Occupancy Models Main Island of Puerto Rico We did not acquire precipitation, fa2, or fa3 layers for Mona Island; the entire island is protected. Therefore, the occupancy model for this island considered only elevation and canopy cover as predictor variables in the competing models. There was minimal built-up class variation on Culebra and Vieques and fa2 and fa3 layers were not available for these islands, so we were not able to include these variables as predictors of the occupancy model.
We compared the models using Bayesian leave-one-out cross-validation (LOO) [59,65]. The log pointwise predictive density (elpd) was calculated for each model using the modSel function of the "ubms" package [59]; we ranked elpd from highest to lowest since the model with the largest elpd value corresponded to the model with the best accuracy [65]. Additionally, we presented the difference between the elpd of models and the elpd of the top-ranked model (Δelpd) along with standard error of elpd (se of Δelpd). We checked the chains' convergence of all parameters of the top-ranked model by assessing the effective sample size (n_eff) and R-hat statistic and by visually inspecting traceplots [59]. Furthermore, we assessed the model fit through residual plots for the state and observation variables.
Based on previous studies, we assumed that the IAS would benefit from high levels of human disturbance and present higher occupancy probabilities in places characterized by more intense human activities [45,66]. Therefore, we expected that invasive species occupancy would be negatively related to elevation because human activities are more common and intense in low-elevation areas (e.g., human settlement and agriculture) in Puerto Rico. A higher proportion of protected areas and canopy cover would be inhibitors of invasive species occurrence by representing more preserved environments [66][67][68]. Since the anuran IAS considered in our study do not present striking adaptations to grow and reproduce in conditions with a low availability of water, we predicted a positive relationship between these species' occupancies and the mean annual precipitation. Additionally, we expect that a proximity to roads and a high proportion of built-up area in a location would positively influence the occupancy of IAS [66].

Results
We based our findings on 1,773,287 1-minute recordings from 8.9 TB of soundscape data. Our manual inspection of recordings together with PM analysis generated an overall list of 95 species detected throughout the Puerto Rican archipelago: 74 birds, 18 frogs, and 3 mammals (Appendix C). We found 92 species on the main island of Puerto Rico, followed by Vieques, Culebra, and Mona with 31, 25, and 19 species, respectively (Appendix C). Overall, 16 species were considered invasive species (Appendix C) and are potential threats to wildlife in the Puerto Rico: 10 birds ( None of the invasive species were detected on Desecheo Island; therefore, this island was not used in the occupancy model. The wild goat (C. hircus) was detected at 21 sites, all of which were on Mona Island. Feral horse (E. ferus caballus) was detected at 11 sites on Vieques and two sites on the mainland; therefore, the occupancy model was performed only for Vieques. Chickens (G. gallus domesticus), Venezuelan troupial (I. icterus), shiny cowbird (M. bonariensis), and domestic dog (C. lupus familiaris) were the most widespread species, detected at 165, 131, 78, and 76 sites, respectively. Chickens (the only IAS for which the occupancy model was fitted for more than one island) were detected at 134 sites on the main island, 27 sites on Culebra, 3 sites on Vieques, and 1 site on Mona. Five bird species (A. amazonica, A. viridigenalis, B. ibis, M. monachus, and S. decaocto) had low raw detections and were detected at few sites (less than 10 sites); thus, we did not run occupancy models or a call activity pattern analysis for them. The other IAS detected were found at between 12 and 24 sites (American bullfrog (L. catesbeianus), house sparrow (P. domesticus), Cuban tree frog (O. septentrionalis), cane toad (R. marina), and white-winged parakeet (B. versicolurus)).
The number of calls recorded and the peak of activity varied greatly between species ( Figure 2). The group with the highest number of detections (i.e., best-scored ROI per site and day) was birds (e.g., chicken had more than 750 raw detections, Venezuelan troupial had 450 detections, and shiny cowbird had 194 detections), followed by mammals (ranging from 23 to 148 detections) and frogs (ranging from 53 to 75 detections). The bird species had a high activity peak early in the morning (between 5 a.m. and 7 a.m.). The peak activity of frog species varied between 1 a.m. and 5 a.m. depending on the species. The American bullfrog had a peak in calling activity at dawn (about 6 a.m.); the cane toad showed a peak at 2 a.m.; and the Cuban tree frog showed two peaks of activity, with the strongest at 3 a.m. and a second less-intense peak at dusk. Mammals did not show a clear calling activity peak.
We fit occupancy models for 11 IAS (all 3 frog and mammal IAS and 5 species of birds). In general, the occupancy models presented good convergence and fit well for the species when checked through the inspection of traceplots (Appendix D) and using R-hat statistic values < 1.1 (Appendix E). However, the "top-ranked" spatial occupancy model of the Cuban tree frog did not show good convergence and efficiency diagnostics for Markov chains (ψ(fa2 + fa3 + RSR-1000); ρ(elevation + date + date 2 )) even after running the model with 200,000 interactions; thus, we used the second-best model for the species to show the parameter estimates (Table 2; Appendix E). In general, the probability of detecting IAS throughout Puerto Rico was medium to low and varied widely, ranging from 0.002 to 0.63 (Appendix E); most species were below 0.3. The best model that explained the detection probability of all species (except for feral horse, white-winged parakeet, and the two domesticated species) included the three predictor variables: elevation, Julian date linear, and Julian date quadratic (Appendix E). In general, the Julian date positively influenced a species' mean detection probability. The relationship between elevation and the detection probability varied depending on the species. Calling activity patterns by group. We used the three first letters of the genus and a specific epithet for species in the legends (e.g., Gallus gallus domesticus = GALGAL). We emphasize that the polar plots for birds and mammals were biased to the diurnal period while that for frogs was biased to the nocturnal period. Table 2. Three top-ranked models for eleven alien invasive species in Puerto Rico. ELPD = expected log pointwise predictive density; ΔELPD = difference of ELPD relative to the top-ranked model; se of Δelpd = standard error of elpd; RSR = restricted spatial regression (threshold = 1000 m); PA = proportion of protected area; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover aged older than 54 years; dist_road = distance from roads (m); built_up = built-up proportion; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date 2 = square of Julian date.

Species
Model Occupancy probability ranged from 0.002 (American bullfrog) to 0.67 (wild goat on Mona). The three amphibian IAS showed a very low occupancy probability (lower than 0.08). The results showed that elevation and precipitation were the variables that best explained the occupancy probability of IAS in Puerto Rico and appeared in the top-model of six species, followed by proportion of forest cover aged between 32-54 years (fa2) and forest cover older than 54 years (fa3; Table 2). The best model selected for four of the species (shiny cowbird, cane toad, wild goat, and house sparrow) was the null model containing only the intercept.
We created maps of model-predicted species occupancy probability across Puerto Rico and outlying islands for the 11 invasive species with occupancy models (Figure 3; Appendix F). Here, we exemplify the predicted occupancy of the Venezuelan troupial (see Appendix F for the occupancy prediction maps for the other species). We chose the Venezuelan troupial as an example because it was the species that showed the top-ranked model with the highest difference between the other a priori candidate models, which suggested more confidence in the top-ranked model (higher ΔELPD between the topranked and second-best models). We limited the forecast map to the islands where the occupancy model was fitted for each species to diminish spurious predictions because the gradients between islands were different. The point-estimates occupancy probability map for the Venezuelan troupial in Puerto Rico in 2021 under the top-ranked occupancy model suggested a high probability of the species occurring in regions close to the coast, mainly in the south and southwest of the main island ( Figure 3). We developed a website (https://bio.rfcx.org/puerto-rico-island-wide (accessed on 14 June 2022)) with the purpose of data sharing and summarizing the results of the study for the use of government agencies (e.g., Departamento de Recursos Naturales y Ambientales (DRNA), U.S. Fish and Wildlife Service) and the general public (e.g., educators and birdwatchers). The website contains information on the vocalizations, distribution, and ecology of the species detected at the 841 sampling sites in Puerto Rico during the project, including a searchable database with detection data, presence/absence maps, and occupancy maps for all birds, mammals, and anurans recorded (Appendix G). All maps, plots, and data used to generate the figures can be freely downloaded. The data and figures on the website are continuously updated according to new acoustic data as they become available.

Discussion
The acquisition of population baseline data for native and invasive species is a fundamental step in monitoring and managing wildlife in dynamic land cover and climate change scenarios. However, detecting and monitoring animal species at large spatiotemporal scales, especially in the tropics, remains a significant challenge. In this study, we presented an end-to-end acoustic monitoring pipeline that was able to detect several soniferous native, endemic, threatened, and invasive species; access their current population status; and summarize and display results in a user-friendly platform.
Converging with our expectations, the occupancy probabilities of IAS were generally lower in environments with less human disturbance. For example, the occupancy probabilities of some IAS were negatively associated with elevation (five species) and forest older than 54 years (four species). These findings suggested that IAS may be favored by human activities because species occupancies were higher in new forest areas that were expected to be more disturbed and human activities are more common and intense at low elevations in Puerto Rico. Our findings were in agreement with an overall correlation between anthropogenically disturbed habitats and invasive species of plants, invertebrates, fishes, birds, frogs, and mammals [45,66,67,69]. Our results revealed that elevation and precipitation were the most important variables for explaining the distribution of most soniferous IAS (the best model for 6 of 11 IAS contained elevation and precipitation as explanatory variables), followed by proportion of forest cover aged between 32-54 years (fa2) and forest cover older than 54 years (fa3).
Free-range pets and domesticated species can also have a negative impact on native species, especially in island ecosystems [68,70]. Although the chicken, dog, wild goat, and feral horse are domesticated species, we decided to include them in the acoustic analysis because they can impact native wildlife directly (e.g., predation) and indirectly (e.g., spreading diseases and impacting on vegetation) [71]; moreover, some feral populations exist in the archipelago. Some studies have shown that domesticated species (such as dogs) can occur widely within protected areas and may represent a threat to native species [72,73]. In contrast, our occupancy models suggested a lower probability of dogs and chickens occupying the protected areas on the main island of Puerto Rico (Appendices E and F), indicating that protected areas can offer some level of protection against domesticated species.
Our knowledge about the main drivers of IAS distribution is still limited, which was reflected in the model-selection process. The null model was the top-ranked model of four species, indicating that the explanatory variables utilized in the models were not good predictors of spatial variation of these species. The anuran IAS showed a low probability of occupancy (<0.07) in the landscape of the main island of Puerto Rico, which may have reflected the low natural availability of aquatic environments in the landscape (i.e., natural low occupancy probability of species) or may have been a result of our sampling design, which was not focused on lentic systems. The three frog IAS had call activity associated with ponds; a new sampling process could easily be designed to include more lentic systems, which would increase the chance of finding frogs with an aquatic life stage.
Estimating the probability of detection and occupancy of IAS can facilitate more efficient management actions because the estimations of parameters related to species occurrence on the landscape will be unbiased and provide an uncertainty measure [74,75]. In our study, most IAS showed a higher probability of detection at the end of the sampling period (late May and early June 2021), suggesting that most species were more vocal and thus more easily detected by PAM at the end of the early high rainfall season. These findings were congruent with breeding activity peaks recorded for terrestrial birds in Puerto Rico [52]. This positive relationship was restricted to the temporal range of the study (March-June) because a year-round sampling design can have more variability and alter the relationship between the Julian day and detection probability.
Despite the large volume of data analyzed through PM, five birds were detected at only a few sites (<10), and we did not run occupancy models for them due to the small number of detections (see Appendix H for detection locations). This low detection rate could correspond to a "real" low occurrence/density of the species in the study area or might have resulted from the species vocalizing sparingly. Even with a low number of detections, knowing where these species were found can present a valuable opportunity to prioritize surveying of locations closely similar in geographical and environmental space where the species were detected. Although working with few detection points (or even one detection point) is challenging and has several limitations, it can be useful in directing resources and field survey efforts to find undetected populations of a "rare" species in a region [76] and monitor possible early-stage expansions.
The detection/nondetection of species is a baseline outcome of audio data analysis that can be used in various ecological analyses that make use of this type of data [28,36,38]. For example, the diel and annual activities of species are fundamental aspects of their life history, and the knowledge of it can be used along with artificial advertisement calls to trap and manage IAS [35,77,78]. We found that diel call activity of the species groups greatly varied, suggesting that monitoring and management programs should focus on specific periods of the day to increase their chance to detect and capture the IAS.
Our end-to-end pipeline provided an effective method for detecting several native, endemic, threatened, and IAS (16 species) in Puerto Rico, including birds, frogs, and mammals (95 species were detected overall). The manual validation of best ROI per site per day of PM analysis detected around 2000 call events of IAS. We took less than two days to validate the PM of the 16 IAS following our workflow. The fast analysis of this vast data set (1,773,287 1-minute recordings) was only possible through a user-friendly web-based platform with annotation functionality and the creation of user-defined playlists that allowed us to combine manual annotations with PM analysis and validations by experts (see Figure 1). While users can take advantage of a variety of available software platforms or develop their own code to analyze PAM data, multifunctional tools that have userfriendly interfaces are necessary to speed up and increase usage of PAM by a wider audience with varying skill sets [79]. Free user-friendly interfaces may be an effective way to implement early-stage detection and assist in the long-term monitoring of IAS populations by conservationists, wildlife managers, and decision makers. Cloud-based platforms such as those used in this project also can facilitate the inclusion of citizen scientists and other experts to improve and speed up the validation of data from the target groups.
Detection and monitoring of IAS is of increasing importance for informing conservation management. Tools using an intuitive GUI can improve data exploration and narrow communication and knowledge gaps between the scientific community and other groups [80]. Here, we introduced a webpage as part of the last step of the Arbimon ecosystem, filling a critical need for a practical and intuitive way to summarize, display, and share ecological results from acoustic-monitoring processes so that the data generated can be easily used and shared by and with environmental agencies. This web-based tool was designed to display biodiversity indicators such as number of detected species, activity patterns, and species occupancy over maps and plots that can support species management.
Previous studies have shown the benefit of combining acoustic monitoring with occupancy modeling to understand native species distribution [38,51,[81][82][83]; in accordance with these findings, our study reinforced that this approach can be useful to understand the distribution of soniferous IAS as well. Additionally, our approach generated detection/nondetection data from species of greatest conservation need beyond IAS that can be used in more complex models (e.g., two-species or multispecies models) to assess relationships and interactions between IAS and SGCN, which can help researchers to understand the potential effects of IAS on native species detection and occupancy probabilities [84][85][86]. PAM has been shown to be very useful for investigating IAS, and there remain a number of avenues for expansion. Other studies have demonstrated the usefulness of PAM as a consistent tool to examine sounds in nature, involving a range of topics [27] such as monitoring native wildlife species [49,82,87], disturbance from human noise [88,89], agricultural pests [90], ecosystem functions [91], disease-transmitting mosquitoes [92], gunshots [93,94], and illegal timber harvests [95]. Although automated techniques are emerging to deal with the massive amount of acoustic data that are intrinsic to this monitoring tool, many of these methods still require manual examination of sounds for training and validating models [47,[96][97][98]. Therefore, manually processing a large amount of recorded data continues to be one of the biggest challenges of PAM, and forthcoming studies may benefit from the pipeline we introduced. Furthermore, the usefulness of passive acoustic monitoring in combating invasive alien species should be boosted if used in actions organized and interconnected globally alongside other emerging tools such as environmental DNA, GIS analysis, camera traps, and citizen science [29,33,99]. We also emphasize the need to develop a long-term real-time alert system using artificial intelligence models to keep a vigilant eye on the early detection of alien invasive species, which can be easily driven through passive acoustic monitoring.

Conclusions
In summary, we showed that passive acoustic monitoring is a flexible, powerful, and practical tool for generating baseline population data for soniferous native and alien invasive species. The end-to-end pipeline that we presented, which can quickly provide data on the presence/absence of species, was used to evaluate the spatial and temporal distribution of IAS. Our results led to the conclusion that the occupancy probabilities of the soniferous IAS were primarily related to areas with the highest human activities. Finally, we present the last step of the Arbimon ecosystem which contains webpages that provide decision makers and wildlife managers with results acquired through PAM in a user-friendly way.

Data Availability Statement:
The detection and nondetection data used during this study is publicly available online at https://bio.rfcx.org/puerto-rico-island-wide and https://arbimon.rfcx.org/project/puerto-rico-island-wide/dashboard. Appendix A Figure A1. Map of the sampling sites across Puerto Rico from March to July 2021. For each sampling site, an AudioMoth device was deployed. The light-green polygons represent protected areas. Figure A2. The AudioMoth device, an open acoustic, lightweight, and affordable autonomous recording unit used to monitor the soundscape in Puerto Rico during 2021. The device hardware, which included a lithium-ion rechargeable battery (blue packet in the photo), was placed in a protective waterproof plastic case with a silica packet and then deployed in a tree/vine/shrub at a 1.5 m height.

Appendix B
Distribution of the eight predictor variables for the occupancy models of invasive species in Puerto Rico.    Appendix C Table A1. Table of 95 species detected (74 birds,

Species
Island

Appendix D
We accessed convergence and mixing across Markov chains for each parameter through visual inspection of traceplots from the best occupancy model of 11 invasive species in Puerto Rico. We used three Markov chains for all the models. When the Markov chains (characterized by different colors in the figures below) showed a random scatter around a mean value, this suggested a good mixing and convergence. The x-axis represents the number of iterations for each chain in the order in which they are drawn. The maximum number of interactions of the models varied (so the x-axis varied as well) because some models need more iterations in order for the Markov chains to achieve convergence of the parameters. The y-axis represents the estimated values of the parameter. Figure A7. Traceplots from the best spatial occupancy model of Venezuelan troupial (Icterus icterus) in Puerto Rico: (ψ(intercept + elevation + precipitation + RSR-1000); (intercept + elevation + date + date2)). Note: alt = elevation; precip = mean annual precipitation; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date; RSR = restricted spatial regression (threshold = 1000 m). Figure A8. Traceplots from the best occupancy model (ψ(intercept + elevation + precipitation); (intercept + elevation)) of white-winged parakeet (Brotogeris versicolurus) in Puerto Rico. Note: alt = elevation; precip = mean annual precipitation. Figure A9. Traceplots from the best occupancy model of chicken (Gallus gallus domesticus) in Puerto Rico: (ψ(intercept + elevation + precipitation + protected area + canopy cover + fa2 + fa3 + distance of road + built-up + RSR-1000); (intercept + elevation)). Note: alt = elevation; precip = mean annual precipitation; protArea = proportion of protected area; canopy = proportion of canopy cover; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover aged 55 years and older; dis_road = minimum distance from road; builtup = built-up proportion; RSR = restricted spatial regression (threshold = 1000 m). Figure A10. Traceplots from the best occupancy model (ψ(intercept); (intercept + elevation + date + date2)) of shiny cowbird (Molothrus bonariensis) in Puerto Rico. Note: alt = elevation; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A11. Traceplots from the best occupancy model (ψ(intercept); (intercept + elevation + Julian day + Julian day2)) of house sparrow (Passer domesticus) in Puerto Rico. Note: alt = elevation; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A12. Traceplots from the best occupancy model of American bullfrog (Lithobates catesbeianus) in Puerto Rico: (ψ(intercept + elevation + precipitation + protected area + canopy cover + fa2 + fa3 + distance of road + built up); (intercept + elevation + date + date2)). Note: alt = elevation; precip = mean annual precipitation; protArea = proportion of protected area; canopy = proportion of canopy cover; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover aged 55 years and older; dis_road = minimum distance from road; builtup = built-up proportion; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A13. Traceplots from the second-best occupancy model (the top-ranked model did not converge well) of Cuban tree frog (Osteopilus septentrionalis) in Puerto Rico: (ψ(intercept + fa2 + fa3); (intercept + elevation + date + date2)). Note: alt = elevation; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover aged 55 years and older; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A14. Traceplots from the best occupancy model of cane toad (Rhinella marina) in Puerto Rico: (ψ(intercept); (intercept + elevation + date + date2)). Note: alt = elevation; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A15. Traceplots from the best occupancy model of domestic dog (Canis lupus familiaris) in Puerto Rico: (ψ(intercept + elevation + precipitation + protected area + canopy cover + fa2 + fa3 + distance of road + built up); (intercept + elevation)). Note: alt = elevation; precip = mean annual precipitation; protArea = proportion of protected area; canopy = proportion of canopy cover; fa2 = proportion of forest cover aged between 32 and 54 years; fa3 = proportion of forest cover aged 55 years and older; dis_road = minimum distance from road; builtup = built-up proportion. Figure A16. Traceplots from the best occupancy model of wild goat (Capra hircus) in Puerto Rico: (ψ(intercept); (intercept + elevation + date + date2)). Note: alt = elevation; date = Julian date (we considered Julian day 1 as the first day of a device starting to record in the field, which was 1 March 2021); date2 = square of Julian date. Figure A17. Traceplots from the best occupancy model of feral horse (Equus ferus caballus) in Puerto Rico: (ψ(intercept + elevation + precipitation); (intercept)). Note: alt = elevation; precip = mean annual precipitation.

Appendix E
Below is the output summary from the best occupancy model used for the prediction maps for 11 invasive alien species in Puerto Rico.    Appendix F Figure A18. Prediction maps using the top-ranked model (mean expected occupancy probability) with the model prediction uncertainty (standard deviation) for the invasive alien species found during passive acoustic monitoring across Puerto Rico. When the top-ranked model of the species was the null model for occupancy or did not converge well, we used the model that ranked second-best to draw the prediction maps (i.e., for Molothrus bonariensis, Osteopilus septentrionalis, Rhinella marina, and Capra hircus), and as such, we highlight that these predictions should be interpreted with caution because there was a high uncertainty associated with them. The circles represent the sampling sites where the species were detected by the AudioMoth devices. The prediction for wild goat was made for Mona Island, the prediction for feral horse was made for Vieques island, and the predictions for the other species were made for the main island of Puerto Rico.

Appendix G
We created a biodiversity insight page as a new feature of Arbimon web-based platform for use by government agencies (e.g., Departamento de Recursos Naturales y Ambientales (DRNA), U.S. Fish and Wildlife Service), and the general public (e.g., educators and birdwatchers). The platform provides access to the main results of this study and information on the calls, behavior, and biology of the species detected in the ARU (https://bio.rfcx.org/puerto-rico-island-wide (accessed on 14 June 2022)). The main features of this page include information on the vocalizations, distributions, and ecologies of most bird and anuran species in Puerto Rico. The page also includes a searchable database that contains the detection data, presence/absence maps, and occupancy maps for the birds and anurans detected at the 841 sampling sites. All maps, plots, and data used to generate the figures can be downloaded. Since the project is still in progress, the data on the page are constantly being updated as more data are validated. Figure A19. Dashboard: This page presents an overview of the project, including background information, objectives, funding, and stakeholders. General results are also presented, such as the total number of detections, sampling sites, species detected, and the number of species in each threat category according to the International Union for Conservation of Nature (IUCN). This page also features a map with all sampling sites and the number of species detected at each site. Figure A20. Species Richness: This page presents the results of the number of species detected and allows for exploration of those data in greater detail. The data are presented as a bar graph with aggregated results that can be filtered by region/site, date, and taxonomic group. Below the bar graph is an accompanying map depicting species richness in the selected sites; the point size reflects the number of species detected at each location. Below the map is a graph displaying the number of species according to time of day, day of the week, month, or date. Figure A21. Activity Overview: This page offers an overview of temporal and spatial vocal activity patterns. It can be used to summarize detection frequency, number of raw detections, or naive occupancy results (i.e., proportion of occupied sites), with a focus given to general patterns at the community level. As with the Richness page, the user can use the filters to select results from taxonomic groups, sampling sites, or specific periods and visualize the results in a map or graphs that present the results according to the hour, day of the week, or month. Figure A22. Spotlight: This page offers an in-depth look at the raw detection and occupancy of individual species. On this page, the user can select the species that were detected in the project data using either the scientific or common name. Basic information about the species, photo, call example, population status according to IUCN, detection frequency, and naive occupancy values are presented. The first map shows the detection results by site and allows for filtering by detection frequency, the number of detections (raw), and naive occupancy (i.e., proportion of occupied sites). A second map, when available for the species, shows the predicted occupancy (i.e., probability of occupancy) in Puerto Rico. In addition, a graph displays either detection frequency or the number of detections results according to the hour, day of the week, or month. Like the Richness and Activity pages, the data set can be subset by selecting one or more sampling sites or sampling periods.