Allele Surfing and Holocene Expansion of an Australian Fig (Ficus—Moraceae)

The creek sandpaper fig of southeastern Australia, Ficus coronata Spin, is culturally significant to Australian traditional owners who made use of the leaves to smooth timber and ate the fruit. The species is thought to have a long history on the continent, with some suggesting a Gondwanan origin. However, distributional patterns and overall ecology suggest a recent expansion across suitable habitats. We used landscape genomic techniques and environmental niche modelling to reconstruct its history and explore whether the species underwent a recent and rapid expansion along the east coast of New South Wales. Genomic analysis of 178 specimens collected from 32 populations throughout the species’ New South Wales distribution revealed a lack of genetic diversity and population structure. Some populations at the species’ southern and western range limits displayed unexpected diversity, which appears to be the result of allele surfing. Field work and genetic evidence suggest a Holocene expansion which may have increased since European colonisation. We also present a novel method for detecting allele surfing—MAHF (minor allele at highest frequency).


Introduction
Post-glacial range expansions, from low latitudes towards the poles, have been reported for multiple species globally [1][2][3][4]. During the Pleistocene, the climate oscillated between glacial and interglacial periods, repeatedly forcing species to contract into refugia (or become extinct), then expand as conditions improved [5,6]. Periods of contraction condensed populations into isolated gene pools, while expansion enabled populations to grow and diverge in response to new environments [7,8] or admix on contact with new populations [9]. These contraction and expansion cycles formed the basis from which today's species distribution and population structure developed [10].
Species' range expansion can be described as having a leading edge, with the gaps being filled as others migrate behind. As repeat founding events migrate further from the source, genetic diversity is reduced and homozygosity increases. Colonising populations consisting of individuals derived from a larger source population start with a random subset of the source population's allelic diversity, equating to a loss of allelic diversity or genetic drift. Being random, genetic drift is not caused by the environment, although selective pressures will ultimately determine which alleles remain. Genetic drift and decreasing relatedness with increasing geographic distance [11] contribute to the process of isolation by distance (IBD [12][13][14]). This means opposite ends of a large continuous species distribution can end up with distinct allelic variation [12]. Low allelic diversity and weak population structure can identify recent range expansion [15], while refugial populations often have high allelic diversity and strong population structure [16].  As suggested by its common name, F. coronata has sandpaper-like leaves that were used by Aboriginal Australians to smooth timber; the fleshy purple fruit from female trees (the species is gynodioecious) is edible. The species name 'coronata' refers to the crown-like ring of bracts around each fig's ostiole. Fig fruits are eaten by many animals [34], including birds and bats [35,36], that can distribute the seed over long distances. Most commonly found around the NSW-QLD border (Figure 2), an area that includes multiple persistent rainforest refugia, the species is habitat-specific and highly dispersible, making it a good subject to study post-glacial expansion patterns.

Sampling Strategy
At the time of writing, F. coronata was recorded for five Australian state ries (Atlas of Living Australia (ALA) (2020) Ficus coronata occurrence recor at: ala.org.au, accessed on 30 May 2020). The species is most common in NS records, southeast QLD with 413 (classified as least concern, ALA 2021), nort toria (VIC) with 46 (listed as threatened, Flora and Fauna Guarantee Act 1 ened List November 2019), Northern Territory with five and Western Aust ages of collections with doubtful identifications for Northern Territory (N Australia (WA) and Far North Queensland (FNQ) were requested from the VIC and SA herbaria. None of the sighted specimens were F. coronata. O CANB 268023, is F. coronulata; MEL 1581926 and MEL 1581933 are possibly coronulata [33]. Three specimens, BRI AQ1005733, BRI AQ0008071 and M from northern QLD, are likely F. opposita. This makes F. coronata an endemic coast Australia (Figure 2). Using genome-wide reduced-representation sequencing (DArTseq) and environmental niche modelling (ENM), we investigated the landscape genomic patterns of F. coronata in order to ascertain a recent migration event into the southern range of its distribution. In particular, we ask: (1) can signals for rapid expansion of F. coronata be detected; and (2) can directionality of migration be determined?

Sampling Strategy
At the time of writing, F. coronata was recorded for five Australian states and territories (Atlas of Living Australia (ALA) (2020) Ficus coronata occurrence records. Available at: ala.org.au, accessed on 30 May 2020). The species is most common in NSW with 4333 records, southeast QLD with 413 (classified as least concern, ALA 2021), northeastern Victoria (VIC) with 46 (listed as threatened, Flora and Fauna Guarantee Act 1988-Threatened List November 2019), Northern Territory with five and Western Australia one. Images of collections with doubtful identifications for Northern Territory (NT), Western Australia (WA) and Far North Queensland (FNQ) were requested from the BRI, CANB, VIC and SA herbaria. None of the sighted specimens were F. coronata. One specimen, CANB 268023, is F. coronulata; MEL 1581926 and MEL 1581933 are possibly F. aculeata x coronulata [33]. Three specimens, BRI AQ1005733, BRI AQ0008071 and MEL 0274275A from northern QLD, are likely F. opposita. This makes F. coronata an endemic of south east coast Australia ( Figure 2). To investigate the hypothesis of a recent southern expansion, leaf material was collected from 178 F. coronata individuals from 32 sites throughout the species' NSW range (Table 1) following the Restore and Renew collection protocol [37]. Where possible, six individuals were collected from each population, though some populations had less than six individuals. Individuals at least 10 m apart and older plants were prioritised. Data for samples were collected using the Restore and Renew mobile app [38]. Samples were stored at −80 • C before freeze drying and then placed in airtight containers with silica beads.

Genome Scans and Data Analysis
A 5-10 mg section of leaf from each specimen was provided to Diversity Arrays Technology (DArT; http://www.diversityarrays.com, accessed on 30 May 2020) for extraction and sequencing. DArTseq uses restriction enzymes targeting low-copy sequences (mainly nuclear) and next-generation sequencing (NGS) to provide thousands of markers across the genome. DArT supplied genetic data in the form of 109,377 single-nucleotide polymorphisms (SNPs) for each sample using a proprietary analytical pipeline. The dataset was pre-processed with an in-house R package RRtools v. 1.0 (unpublished; Jason Bragg, Royal Botanic Garden Sydney) as per the Restore and Renew workflow described in [37], removing low quality (reproducibility ≤ 0.98 and missingness ≥ 0.05) and duplicate SNPs, resulting in 7492 SNPs that were used in the analyses.
Genetic structure and admixture analyses were performed with LEA, an R package for landscape and ecological association [39], and mapped with the R library oz [40] and mapplots [41]. The best value for K was determined using snmf from LEA by plotting cross entropy for K = 2 to 10 and selecting the most suitable K from the average of 10 runs. Data for populations and individuals were run separately to check for conformity.
Genetic data were converted to genlight format and Principal Component Analysis (PCA) was performed using the function glPca (Principal Component Analysis for genlight objects) of the R package adegenet [42,43]. Population genetic measures were obtained to explore the distribution of genetic variation and connectivity between populations. The fixation index (F ST ) was estimated using R package SNPrelate [44] using W&H02-relative beta estimator [45]. This was also used in the F ST by distance Mantel created with the R package Vegan [46]. Estimates for population allelic richness (A R ), expected heterozygosity (H E ), observed heterozygosity (H O ) and inbreeding coefficient (F IS ) were calculated with the R package diveRsity [47]. Private allele counts were performed with the R package poppr [48].
Range expansion analysis was performed with the R package rangeExpansion [49]. Genetic data were converted to snapp format and all regions were tested by supplying a NULL region list. Falistocco [50] notes that figs are commonly diploid so the ploidy was assumed to be two.
With the rangeExpansion results lacking sensitivity to a recently expanded population such as F. coronata and minor allele frequency (MAF) known to strongly affect inferred population structure [51], we experimented with using MAF to detect allele surfing. A table of minor alleles for each site was created with the R package RRtools. We then counted the occurrences of each minor allele for each population. An allele was regarded as MAHF (minor allele at highest frequency) for the population with the highest occurrence count. Each population can have 0 or more MAHF but an allele can only be MAHF for one population. All MAHF were counted for each population and graphed. This provides a basic visual of minor alleles being lost (through processes such as bottlenecking) and gained (stochastically, adaptively or through allele surfing).
Environmental niche modelling (ENM) was used to estimate possible past and future regions with suitable climate for this species, and uncover possible refugia and expansion directionality. The R library Kuenm [52] was used to create ENMs following the steps outlined in https://github.com/marlonecobos/kuenm (accessed on 5 January 2021). The package automates the creation of multiple Maxent models [53] using various regularization parameters, combinations of feature classes and environmental predictors. The package can also test all models it creates and select an optimal model and transfer this to different environmental layers or eras.
Layer names for environment sets LGM and MH were renamed and ordered to match those in the current and future sets. All environmental sets were cropped to include Papua New Guinea and Eastern Australia. Four of the 19 environmental layers were removed because of their possible bias effect [59], while others were removed after tests with maxent showed they had no effect of the model. The remaining layers were placed into three Images of the models were generated by copying the Java code contained in the .bat files and running it in a bash terminal window.

Genetic Structure
Results for population genetic structure derived from LEA admixture results ( Principal component analysis (PCA, Figure 5) of genetic data for all 178 individuals shows very little clustering into their respective populations, with most individuals forming one continuous cluster in a broad north to south pattern. The western population (08) and the most southern population (32) are separated from all other populations, though overall variation displayed is very low (PC1 = 3.9%, PC3 = 2.9%; Figure 5).
MAHF (minor allele at highest frequency, Figure 6) analysis showed an overall loss of minor alleles from north to south (population 01 to 32). There was also a loss of alleles in an east to west direction (population 01 to 08). The MAHF count is almost zero before a large peak at the western extreme (population 08), and reduces again before another smaller peak for the most southern (population 32).         Range expansion analysis ( Figure 7A) highlighted all of the northern distribution as a possible origin of expansion and the southern limits as being non-origin, suggesting a general southerly expansion pattern.
Genetic diversity estimates for the 32 NSW populations in this study (Table 2 and Figure 7) show allelic richness (AR; Figure 7B) is lowest for the western extreme (08, AR = 1.347) and most southern population (32, AR = 1.329). AR is highest at two central coast populations (12-Red Head, AR = 1.472 and 13-Barrington Tops, AR = 1.474), though there is very little difference over the whole study region. Observed heterozygosity (HO, Figure  7C    Range expansion analysis ( Figure 7A) highlighted all of the northern distribution as a possible origin of expansion and the southern limits as being non-origin, suggesting a general southerly expansion pattern.  Figure 7 for mapped highs and lows and Table 1 for site population details of populations. * Possible biased results with only 2 individuals collected, this site has been excluded from the maps in Figure 7.    Figure 7D) occurs in the northern populations ((01, P A = 222) and (02-Billinudgel, P A = 154)). Population 07 (Dorrigo) has the lowest amount of private alleles P A = 28; only two samples were sequenced for the site, so this may be a biased result. Excluding population 07 leaves the southern populations 29 (Mount Agony) and 30 (Deua National Park) with the lowest P A . Inbreeding coefficient was highest for the two most northern populations (01, F IS = 0.018 and 02, F IS = 0.020) and lowest for the most southern population (32, F IS = −0.34) and population 07 (F IS = −0.290).
Environmental niche modelling ( Figure 8) for LGM, MH, the current era and the future (2050) show a north to south movement of suitable environment for the species, with an overall decline over time.
LGM shows the largest area of suitable environment (in red) centred around southeastern QLD and northeastern NSW. The MH prediction shows suitable areas reduced by about half and located more southernly than during the LGM. The current era maintains similar levels of suitability to the MH and a continued southward shift of suitable habit. The future predication shows the species losing more suitable habitat, with only small patches of highly suitable environment shifting south. At the northern extent, suitable patches have moved westwards. centred around southeastern QLD and northeastern NSW. The MH prediction shows suitable areas reduced by about half and located more southernly than during the LGM. The current era maintains similar levels of suitability to the MH and a continued southward shift of suitable habit. The future predication shows the species losing more suitable habitat, with only small patches of highly suitable environment shifting south. At the northern extent, suitable patches have moved westwards.

Discussion
The results from our genetic analysis indicate a rapid and recent southern range expansion for F. coronata. Very little genetic structure was detected over the complete NSW range ( Figure 5), a recognised signal of expansion [60]. All analytical approaches (Figures 3-6; and Appendix A1) identified the extreme western (08) and southern (32) populations as more genetically distinct than other populations over similar distances. Our range expansion analysis ( Figure 7A) defines most of the northern distribution of this species as the origin of expansion and the tip of the southern distribution as non-origin. Although the analysis provided no clear population of origin, a consequence of very low differentiation across the species distribution, it does support a north to south expansion. Private allele counts (PA, Figure 7D) also support a north to south expansion, with the highest PA in the north and lowest in the south. Environmental niche models ( Figure 8 also support a southern shift of suitable habitat from the LGM to now, and the MAHF analysis ( Figure  6 supports a recent expansion with a gradual loss of MAHF in a north to south and east to west direction.
The increased frequency of MAHF for populations 08 and 32 are a likely signature of allele surfing [61][62][63]. Allele surfing occurs when alleles 'surf' through populations on the 'wave' or leading edge of migration. This causes alleles that are uncommon across the

Discussion
The results from our genetic analysis indicate a rapid and recent southern range expansion for F. coronata. Very little genetic structure was detected over the complete NSW range ( Figure 5), a recognised signal of expansion [60]. All analytical approaches (Figures 3-6; and Appendix A Figure 1) identified the extreme western (08) and southern (32) populations as more genetically distinct than other populations over similar distances. Our range expansion analysis ( Figure 7A) defines most of the northern distribution of this species as the origin of expansion and the tip of the southern distribution as non-origin. Although the analysis provided no clear population of origin, a consequence of very low differentiation across the species distribution, it does support a north to south expansion. Private allele counts (PA, Figure 7D) also support a north to south expansion, with the highest PA in the north and lowest in the south. Environmental niche models ( Figure 8 also support a southern shift of suitable habitat from the LGM to now, and the MAHF analysis ( Figure 6 supports a recent expansion with a gradual loss of MAHF in a north to south and east to west direction. The increased frequency of MAHF for populations 08 and 32 are a likely signature of allele surfing [61][62][63]. Allele surfing occurs when alleles 'surf' through populations on the 'wave' or leading edge of migration. This causes alleles that are uncommon across the species distribution (low minor allele frequency-MAF) to be represented at high frequencies in populations at the migration front (minor allele at highest frequency-MAHF). Declines in population densities towards the expansion front can intensify the effects of drift [62] and the chance for alleles to surf. Mutations that develop near the wave front have a higher probability of being caught by the wave [64] and all alleles that have surfed can likely be traced to an ancestor that lived on a previous wave front [65]. As a population expands and loses alleles through genetic drift, the remaining variants can be found at higher frequency in the new front than in any other population. Surfing alleles can become fixed in a newly occupied habitat at the species' range boundaries [66], leading to populations with allelic combinations found nowhere else in the range.
The dramatic difference between MAHF ( Figure 6) for populations 07 and 08 may indicate that shorter distances between origin and leading-edge populations increase the chance of surfing. The more distant southern population 32 has a much smaller increase in MAHF; however, this may be because it is not the current migration front, which is located 150 km to the south in Victoria. The Victorian population may also show a dramatic increase in MAHF.
The idea that a newly expanded population has unusual allelic frequencies appears counter intuitive and may lead to an inference of local adaptation [67], historical persistence within refugia, or even cryptic speciation. Peischl et al. [64] suggested that some studies on human adaption to local environments, e.g., [68,69], may be misinterpreted allele surfing events.
Environmental niche modelling (Figure 8), based on climatic data, supports a post-LGM southern expansion of habitat suitable to F. coronata, a southerly shift that will potentially continue into the future. The period with the greatest availability of suitable habitats appears to be the LGM (Figure 8) especially around northern NSW and southern QLD. The genetic signal of rapid expansion displayed by F. coronata is similar to that seen in rapidly expanding Sunda-derived taxa [4,19,70].

Conclusions
Our results, including the novel MAHF analysis, highlight F. coronata as an example of the little explored topic of allele surfing and another example of a rainforest species rapidly expanding distribution during the Holocene. This study has implications for conservation and management of this species, which is currently regarded as threatened in Victoria (Flora and Fauna Guarantee Act 1988-Threatened List November 2019). Although not sampled in this study (we have included collections from Brogo, 150 km north), it appears likely the F. coronata has recently expanded into eastern Victoria, possibly in the last century. The earliest herbarium collections from the region are by Wakefield in 1950 and only 46 collections have been made to date, all around Mallacoota. This study suggests the small Victorian population is more likely the product of on-going southern expansion than a relictual population in decline. The widespread practice of planting this species in bush regeneration may be introducing the species to new areas, a process that needs greater consideration in view of possible range shifts as a consequence of anthropogenic climate change.