Genetic Assignment Tests to Identify the Probable Geographic Origin of a Captive Specimen of Military Macaw ( Ara militaris ) in Mexico: Implications for Conservation

: The Military Macaw ( Ara militaris ) faces a number of serious conservation threats. The use of genetic markers and assignment tests may help to identify the geographic origin of captive individuals and improve conservation and management programs. The purpose of this study was to identify the possible geographic origin of a captive individual using genetic markers. We used a reference database of genotypes of 86 individuals previously shown to belong to two different genetic groups to determine the genetic assignment of the captive individual of unknown origin (captive specimen) and ﬁve individuals of known geographic origin (as positive controls). We evaluated the accuracy of three assignment/exclusion criteria to determine the success of correct assignment of the individual of unknown origin and the ﬁve positive control individuals. WICHLOCI estimated that eight loci were required to achieve an assignment success of 83%. The correct geographic origin of positive controls was identiﬁed with 83% conﬁdence. All of the analyses assigned the captive individual to the genetic group from the Sierra Madre Oriental. Bayesian assignment tests, tests for genetic distance and allele frequency tests assigned the unknown individual to the locations from the Sierra Madre Oriental with a probability of 71.2–82.4%. We show that the use of genetic markers provides a promising tool for determining the origin of pets and individuals seized from the illegal animal trade to better inform decisions on reintroduction and improve conservation programs.


Introduction
The order Psittaciformes contains some of the most charismatic and recognizable bird species in the world [1]. However, of the order's approximately 352 species, 26% face some degree of extinction risk [2]. For example, out of the 22 Psittacidae species recorded in Mexico [1,3], 20 are at risk according to Mexican law [4], and at the international level, the Red List of the International Union for the Conservation of Nature (IUCN) places eight of those species in some risk category [5]. The Military Macaw (Ara militaris) is one of the endangered psittacid species in Mexico, and faces two main threats: (1) habitat transformation (loss, fragmentation and degradation) [6,7], and (2) illegal collection for the national and international illegal pet trade [7][8][9][10][11][12][13][14][15]. Indeed, illegal trafficking has led to the extirpation of populations from conserved areas [11,12,16].
In Mexico, the illegal wildlife trade has threatened 19 out of the 22 Psittacidae species [7]. The capture of any wild Psittacidae species was outlawed in Mexico in 2003, Diversity 2021, 13, 245 2 of 13 and the current General Wildlife Law (LGVS) prohibits the extraction of psittacid species, only granting permits for conservation or scientific research purposes [17]. One of the objectives of the LGVS was to guide management efforts, including the recovery, reproduction, research, release, and/or relocation of individuals [18]. One of the problems faced by reintroduction and recovery efforts is that in most cases, the geographic origin of animals recovered from the illegal pet trade is unknown. Information on the geographic origin of rehabilitated individuals is crucial in order to avoid mixing individuals from genetically distinct populations, which can lead to genetic problems (e.g., local maladaptation and outbreeding depression) [19][20][21][22].
Molecular tools make it possible to answer questions concerning evolutionary history, define taxonomic uncertainties, and identify release locations using molecular markers (e.g., microsatellites) and statistical approaches [23][24][25][26]. However, these techniques are not often used for the identification of release locations for rehabilitated birds illegally taken from the wild [21,27]. The use of molecular tools to establish the origin of individuals for conservation purposes is increasing in reintroduction plans and for identifying illegal trade sites, as demonstrated by studies of several species, such as the Hyacinth Macaw (Anodorhynchus hyacinthinus) [28], the Blue-and-Yellow Macaw (Ara ararauna) [21] and the European Pond Turtle (Emys orbicularis) [29].
The purpose of this study was to determine the probable geographic origin of a captive Military Macaw of unknown origin using different molecular statistical analyses and test the accuracy of these techniques using individuals of known origin, in order to generate a protocol that can be used for reintroduction programs, for management and conservation.
The Military Macaw is one of the most charismatic species in the New World. Its distribution is fragmented, ranging from northern Mexico to northwestern Argentina [1,6,30]. In Mexico, the Military Macaw is distributed in apparently isolated colonies in two separate areas. One includes the Sierra Madre Occidental and the Sierra Madre del Sur (from southern Sonora to Chiapas); the other is in the Sierra Madre Oriental, where the macaws are reported in Tamaulipas, San Luis Potosi, Guanajuato, and Querétaro [31,32]. The geographic distribution of the Military Macaw in Mexico declined by 43% over 16 years (2000-2016) [10,13,14]. It is endangered under Mexican law [4,13], vulnerable on the IUCN Red List [5], and listed in Appendix I of the Convention on International Trade in Species [33].
A study conducted by Rivera-Ortíz et al. [34] on the genetics of the Military Macaw used microsatellites from samples collected in seven Mexican locations and found strong genetic structuring, showing two groups presenting geographic concordance. Group one corresponded to the locations found in the Sierra Madre Occidental and the Sierra Madre del Sur (Pacific slope), and Group two corresponded to the locations found in the Sierra Madre Oriental (Gulf of Mexico slope). Given these results, the authors proposed the protection of the two genetic groups found in the three physiographic regions as independent conservation units. In this study, we used both classification (correspondence analysis) and genetic assignment methods to evaluate whether the Military Macaw individual of unknown origin belonged to any of those previously identified genetic groups, and if possible, assign it to a particular location.

DNA Extraction and Genotyping
We extracted DNA from five feathers collected from a captive individual housed at the AFP OCEAN Foundation A.C. The rachises of the collected feathers were cleaned with 75% molecular grade ethyl [34]. This same procedure was performed with positive control samples from five individuals of known geographic origin.
Total genomic DNA was extracted using the standard digestion protocol with Proteinase-K/sodium dodecyl sulfate (SDS), followed by chloroform and alcohol purification as described by Leeton and Christidis [35]. Eight loci were amplified from nuclear microsatellites using primers designed for other parrot species (five for Ara ararauna and Diversity 2021, 13, 245 3 of 13 three for Amazona guildinguii) [36][37][38]. These polymorphic microsatellites were those previously used by Rivera-Ortíz et al. [34] for Military Macaw individuals from seven different locations.
These eight loci were amplified by polymerase chain reaction (PCR) according to the parameters used by Rivera-Ortíz et al. [34]. Electrophoresis was carried out using an ABI PRISM 3100 Avant sequencer (Applied Biosystems) with Gene Scan LIZ 500 to determine fragment size. Fragments and their final size were analyzed using GENE MAPPER 4.0 software (Applied Biosystems). Since this program automatically determines allele size, we visually checked the electropherograms of microsatellites from the eight loci to corroborate their size and number. PCR sequences were repeated for samples with unclear electropherograms to resolve uncertainties [39].

Data Analysis
Genetic assignment analyses were conducted using the genotypes of 86 individuals from seven locations grouped into the two genetic groups reported by Rivera-Ortíz et al., [34]. Those genetic groups considered the candidate places of origin of the captive specimen of unknown origin and the five specimens of known origin (one individual from Sinaloa, two from Nayarit, one from Oaxaca and one from Tamaulipas).
Theoretical studies have examined how the number of loci and alleles relate to the success of assignment [40,41]. WICHLOCI 1.0 software [41] chooses the best combination of loci to assign a captive individual by analyzing the data extracted from candidate locations. Combinations of the eight microsatellites were used to test the minimum loci needed for a successful assignment [41,42].

Factorial Correspondence Analysis (FCA)
First, we did a factorial correspondence analysis (FCA) using GENETIX 4.05.4 software [43]. This analysis is a multivariate interdependence statistical method that is well adapted to describe associations between variables [44] and provides a graphical display of the genetic relationships between the individuals of interest and those of the reference populations in a multidimensional space based on allelic data. FCA was performed using three data combinations: (i) positive controls + unknown individual + individuals from the Sierra Madre Occidental/Sierra Madre del Sur populations + individuals from the Sierra Madre Oriental populations and (ii) positive controls + unknown individual + individuals from the Sierra Madre Oriental. These combinations were created to determine whether any differences existed between the unknown individual and the locations or candidate genetic groups.

Genetic Assignment Analysis
A Bayesian approach was used to assign the unknown individual and positive controls to genetic groups or populations, implemented in STRUCTURE 2.3.1 software [45,46]. This approach was designed to infer the number of genetic groups or populations of individuals (K) according to their genotypes and estimate the proportional membership of each individual's genotype to one or more of the inferred genetic clusters.
We used the results of the population structure analysis of Military Macaws from Rivera-Ortíz et al. [34], which identified two genetic groups: (1) Sierra Madre Occidental/Sierra Madre del Sur populations and (2) Sierra Madre Oriental populations. Within these groups, we explored the possibility of hierarchical structuring as recommended by Jombart [47]. We repeated these analyses until no additional structure was found within clusters, i.e., until the optimal K value was 1. The burn-in length for each repetition consisted of 500,000 steps, followed by 10,000,000 iterations, under the admixture assumption. No clear substructure was detected in the two genetic groups reported by Rivera-Ortíz et al. [34] (Supplementary Materials, Figure S1). A similar analysis, the discriminant analysis of principal components (DAPC) also showed that the studied seven candidate locations grouped into two inferred clusters (K = 2), according to Bayesian information criterion  Figure S2). Therefore, we assumed that there was no hierarchical structure and we continued the genetic assignment analysis under these conditions.
To assign the unknown individual and positive controls of known origin back to their genetic groups, we used the USEPOPINFO function within an admixture framework. We performed this analysis with all of the controls and the two genetic groups.
We used 10 repetitions in a range of K = 1 to K = 10. The burn-in length for each repetition consisted of 500,000 steps, followed by 10,000,000 iterations, under the admixture assumption in order to determine the maximum value of the posteriori probability (lnP (D)), to detect the true K [45]. CLUMPP 1.1.2 software [48] was used to eliminate label switching, using the greedy algorithm with 1,000 random input orders. These values were visualized using bar plots prepared with DISTRUCT software [49], which showed how the test individuals were assigned relative to the grouping of the reference set of individuals, and to determine the probability of their assignment to one of the two genetic groups identified by Rivera-Ortíz et al. [34].
Genetic relationships between the unknown individual and positive controls and the genetic groups/locations were also examined by applying discriminant analysis of principal components (DAPC) [47] using the "adegenet 2.1.3" package [26] in R 4.0.5 software [50], with the number of principle components set to 35 following alpha-score indication. DAPC is a multivariate, model-free approach designed to generate clusters based on prior population information [26]. DAPC allowed us to analyze the population structure by assigning the unknown individual and positive controls to the genetic groups or locations.
We used three genetic assignment/exclusion approaches implemented in GENECLASS 2.0 software [51]. The analyses were carried out for the two genetic groups and for each of the locations that contain them. The first approach used allele frequencies [52]; the unknown individual and positive controls were assigned to the genetic groups and candidate locations where each of their genotypic frequencies was expected to be the highest. We calculated the probability of the genotype of the controls and then applied the simulation algorithm proposed by Paetkau et al. [53] with a Monte Carlo (MC) resampling of 10,000 steps and an exclusion threshold of p < 0.05. In the second approach, we used a partially Bayesian test based on Rannala and Mountain [54], which estimates the population's allele frequencies and individual assignment's statistical significance (unknown individual and positive controls). We used the simulation algorithm proposed by Paetkau et al. [53] with an MC resampling of 10,000 steps and an exclusion threshold of p < 0.05. In the third approach, we calculated the genetic distance between the unknown individual and candidate groups and locations. The same was done for the positive controls [55].
We used three measurements of genetic distance: (1) Nei's minimum genetic distance [56], (2) Nei's DA genetic distance [57], and (3) the simulation algorithm proposed by Paetkau et al. [53] with an MC resampling of 10,000 steps and an exclusion threshold of p < 0.05. A self-assignment of all of the individuals from the reference populations was also performed, with an exclusion threshold of p < 0.05, with an MC resampling of 10,000 steps.

Results
The summary statistics per locus and per-locations locus, positive controls and unknown captive individual are given in the Supplementary Materials (Tables S1 and S2).
The assignment scores estimated by WICHLOCI 1.0 software indicated that the contribution to the genetic assignment of each of the eight loci varied between 9.0% (locus UnaCT21) and 15.55% (locus UnaCT21). Using eight loci produced the highest score, assigning the captive specimen to the candidate populations with an accuracy of 83% ( Table 1). The eight loci were used in all subsequent assignment/exclusion tests to improve the assignment success for the unknown captive individual and positive controls.

FCA
Using the combination of the positive controls + unknown individual + individuals from the Sierra Madre Occidental/Sierra Madre del Sur locations + individuals from the Sierra Madre Oriental locations, the FCA produced a data cloud showing the position of each of the 86 individuals from the candidate genetic groups in a two-dimensional space (taken from the study by Rivera-Ortíz et al. [34]) including the unknown individual and the positive controls. Two distinct genetic groups were differentiated in the data cloud: (1) individuals from the candidate locations from the Sierra Madre Occidental/Sierra Madre del Sur and (2) individuals from the candidate locations from the Sierra Madre Oriental (Figure 1). We observed that the positive controls of the individuals of Sinaloa, Nayarit, and Oaxaca were located in the point cloud of the candidate locations from the Sierra Madre Occidental/Sierra Madre del Sur. In contrast, the unknown individual and the positive control from Tamaulipas were located in the point cloud of the candidate locations from the Sierra Madre Oriental (Figure 1). UnaCT21) and 15.55% (locus UnaCT21). Using eight loci produced the highest score, assigning the captive specimen to the candidate populations with an accuracy of 83% ( Table  1). The eight loci were used in all subsequent assignment/exclusion tests to improve the assignment success for the unknown captive individual and positive controls.

FCA
Using the combination of the positive controls + unknown individual + individuals from the Sierra Madre Occidental/Sierra Madre del Sur locations + individuals from the Sierra Madre Oriental locations, the FCA produced a data cloud showing the position of each of the 86 individuals from the candidate genetic groups in a two-dimensional space (taken from the study by Rivera-Ortíz et al. [34]) including the unknown individual and the positive controls. Two distinct genetic groups were differentiated in the data cloud: (1) individuals from the candidate locations from the Sierra Madre Occidental/Sierra Madre del Sur and (2) individuals from the candidate locations from the Sierra Madre Oriental ( Figure 1). We observed that the positive controls of the individuals of Sinaloa, Nayarit, and Oaxaca were located in the point cloud of the candidate locations from the Sierra Madre Occidental/Sierra Madre del Sur. In contrast, the unknown individual and the positive control from Tamaulipas were located in the point cloud of the candidate locations from the Sierra Madre Oriental (Figure 1).  The combination of the positive controls + unknown individual + individuals from the Sierra Madre Oriental showed that the unknown individual was placed within the data cloud corresponding to the Sierra Madre Oriental candidate locations and was closest to the individuals from the Querétaro location. The positive control of the individual from Tamaulipas was also placed in this cloud but was closer to the individuals from the location from Tamaulipas. In contrast, positive control individuals from Sinaloa, Nayarit, and Oaxaca formed a cloud of points that was quite distant from the individuals from the reference locations in the Sierra Madre Oriental (Figure 2). ty 2021, 13, x FOR PEER REVIEW 6 of 13 The combination of the positive controls + unknown individual + individuals from the Sierra Madre Oriental showed that the unknown individual was placed within the data cloud corresponding to the Sierra Madre Oriental candidate locations and was closest to the individuals from the Querétaro location. The positive control of the individual from Tamaulipas was also placed in this cloud but was closer to the individuals from the location from Tamaulipas. In contrast, positive control individuals from Sinaloa, Nayarit, and Oaxaca formed a cloud of points that was quite distant from the individuals from the reference locations in the Sierra Madre Oriental (Figure 2).

Genetic Assignment
STRUCTURE showed the two genetic groups (K = 2) reported previously by Rivera-Ortíz et al. [34]. The results of this analysis (Figure 3) indicates that the positive control individuals were correctly assigned to the locations corresponding to their known geographic origins. The unknown individual was assigned to the genetic group of the Sierra Madre Oriental with a genetic allocation ratio of 97.1% to 99.6 % ( Table 2).

Genetic Assignment
STRUCTURE showed the two genetic groups (K = 2) reported previously by Rivera-Ortíz et al. [34]. The results of this analysis (Figure 3) indicates that the positive control individuals were correctly assigned to the locations corresponding to their known geographic origins. The unknown individual was assigned to the genetic group of the Sierra Madre Oriental with a genetic allocation ratio of 97.1% to 99.6% (Table 2). DAPC supported STRUCTURE, identifying two genetic groups (K = 2) (Figure 4), according to Bayesian information criterion (BIC). The DAPC plot also reflects the assignment probabilities of the positive control individuals as well as the individual of unknown origin to the Sierra Madre Oriental (Figure 4).

Genetic Assignment
STRUCTURE showed the two genetic groups (K = 2) reported previously by Rivera-Ortíz et al. [34]. The results of this analysis (Figure 3) indicates that the positive control individuals were correctly assigned to the locations corresponding to their known geographic origins. The unknown individual was assigned to the genetic group of the Sierra Madre Oriental with a genetic allocation ratio of 97.1% to 99.6 % ( Table 2).   DAPC supported STRUCTURE, identifying two genetic groups (K = 2) (Figure 4), according to Bayesian information criterion (BIC). The DAPC plot also reflects the assignment probabilities of the positive control individuals as well as the individual of unknown origin to the Sierra Madre Oriental (Figure 4). In the four approaches implemented by GENECLASS, positive controls from Sinaloa, Nayarit and Oaxaca were assigned to the Sierra Madre Occidental/Sierra Madre del Sur genetic group, with values ranging from 70% to 82.4%. The positive control from Tamaulipas and the unknown individual were assigned to the genetic group from the Sierra Madre Oriental with values of 71.2% to 82.4% (Table 3). In the four approaches implemented by GENECLASS, positive controls from Sinaloa, Nayarit and Oaxaca were assigned to the Sierra Madre Occidental/Sierra Madre del Sur genetic group, with values ranging from 70% to 82.4%. The positive control from Tamaulipas and the unknown individual were assigned to the genetic group from the Sierra Madre Oriental with values of 71.2% to 82.4% (Table 3).
The unknown captive individual was assigned to the candidate locations from Tamaulipas and Querétaro (Table 4). The analysis of Nei's genetic distances and the minimum Nei distance proposed by Cournuet et al. [55][56][57] showed the probabilities of assignment of the unknown individual to the locations from Querétaro and Tamaulipas (Table 4). The partially Bayesian approach taken by Rannala and Mountain [54] and the allele frequency approach described by Paetkau et al. [53] both assigned the unknown individual to the location from Querétaro, with probabilities of 32.7% and 30.3%, respectively (Table 4). All positive control individuals were assigned correctly to each of their locations of origin; the four approaches showed assignment probabilities between 29.8% and 55.0% for individuals from Sinaloa, Nayarit, Oaxaca, and Tamaulipas (Table 5).   Self-assignment tests of the candidate locations panel correctly assigned between 65.1% and 90.6% of the individuals to their population of origin (Table 6).

Discussion
Identification of an individual's geographic origin by means of genetic analysis depends on the ability to assign it to a particular location, which in turn depends on the level of genetic structuring among reference locations [58]. Here, we tested the ability of molecular genetic assignment to identify the likely location of origin of one individual of the Military Macaw of unknown origin and five individuals of known origin, in order to evaluate the method's utility in future conservation efforts for the species. In this study, the results showed that the methods tested were useful in identifying the geographic areas from which individuals likely originated, for both the unknown individual and the five positive controls.
The results of the FCA, STRUCTURE and DAPC tests grouped the unknown individual with the Sierra Madre Oriental genetic group with high confidence. However, it was impossible to assign it to a specific geographic location because there is no differentiation between individuals from different reference locations in this genetic group, indicating gene flow. These FCA, STRUCTURE and DAPC results are reliable because the reference sample of individuals used in the study and provided by Rivera-Ortíz et al. [34] presents a marked genetic structure and differentiation across the distribution range of the Military Macaw in Mexico, showing a pattern that was also found by Eberhard et al. [59] with mitochondrial markers. These previously documented patterns of genetic structure are important in the context of the present study because structure and differentiation among the reference locations must be high if there is to be reasonable success in geographic allocation using grouping methods (with 80-100% correct allocation) [26,60,61].
The allocation/exclusion analyses carried out using GENCLASS suggest that the likely origin of the unknown individual is the Sierra Madre Oriental metapopulation, as determined by the grouping analyses. Three of the four criteria used for the allocation/exclusion analyses show some probability that the unknown individual belongs to the Querétaro location, although with relatively low certainty (30-46% probability). These low probability values should be interpreted with caution, since they may be affected by small sample sizes in some of the reference locations. Some authors suggest that a sample of 30-50 individuals per reference location is necessary to allow accurate estimates [26,55,58]. Unfortunately, obtaining large sample sizes in studies of endangered species is extremely difficult due to small population sizes, restricted areas, and difficulty accessing their distributional areas [13,62,63], as in the case of the Military Macaw.
The different methods used to identify the probable location of origin of the individual Military Macaw of unknown origin proved to be effective and complementary, as demonstrated in this study. When carrying out this type of analysis, we recommend graphically showing the genetic similarity of the individuals as a first step that reveals if the samples of unknown origin are grouped in the reference localities [64]. Then, consider a Bayesian approach to determine the probability that the individuals of unknown origin originate from a population, considering all reference localities together [26]. Finally, use the tests to exclude or identify individuals of unknown origin in the reference localities, to determine the probability that individuals of unknown origin are rejected or belong to the reference localities [65].
Our study shows that given the degree of population genetic structure in Military Macaw locations in Mexico, it is possible to use microsatellite data to identify the probable location of origin of an individual of unknown provenance. This, in turn, makes it possible to make a more informed selection of locations at which the individual could be released. The captive specimen was geographically assigned to the Sierra Madre Oriental, and according to our results, is a candidate for release in that zone. It is essential that the programs for reintroducing and releasing Military Macaw individuals into the wild make proper use of this kind of molecular tools [42,66,67], given that for an endangered species, such as the Military Macaw, the strong genetic structuring of wild locations may reflect local adaptations that would be lost if they were to be managed as a single group [34].
To improve the accuracy of assigning individuals of unknown origin to their correct populations, it is crucial to continue genetic studies of wild locations and increase the number of molecular markers used in genetic analyses. Relatively low numbers of microsatellites were used in this study, but microsatellites have provided sufficient power for geographic assignment of a variety of wild species due to their high level of polymorphism and genetic structure of the populations [68][69][70]. The use of other markers such as mitochondrial DNA would be very informative and complementary since it might allow us to distinguish lineages that correspond to particular geographic areas [21]. Identification of single nucleotide polymorphisms (SNPs) from genomic data also have a significant advantage for geographic assignment, since information from hundreds or thousands of SNPs could potentially provide improved resolution of patterns of genetic structure, and thus, the more precise assignment of an individual's geographic origin [61].
Our study demonstrates that in combination with the reference samples analyzed by Rivera-Ortiz et al. [34], currently available molecular markers and statistical assignment and exclusion software can help identify the geographic origin of captive individuals or specimens confiscated from illegal trade [50]. No studies have been conducted to analyze the number of Military Macaw individuals trapped each year, but Cantú et al. [16] estimated that 65,000 to 78,000 psittacid individuals are poached for illegal trade and suffer a mortality rate of 77%. Only about 2% of poached individuals are seized by the Mexican Federal Environmental Protection Agency (PROFEPA) [16], but given how many are poached, this small percentage still represents several hundred individuals. In this sense, identifying the geographic origin of captive individuals or specimens confiscated from illegal trade helps biodiversity managers to detect locations with intense poaching, and thus, focus efforts and resources on these sites to prevent poaching. It will also support and guide restoration or demographic translocation programs if they are deemed necessary to increase genetic variability [23,28].
A crucial component of this study was the availability of the set of reference samples of known geographic origin [34]. We recommend the establishment of large DNA reference collections and large public databases containing allele frequencies from many populations, and the use of museum collections, which can play an essential role since DNA can be extracted from museum skins. Any genetic analysis that attempts to identify geographic origin of an individual/sample depends on having good data on georeferenced genetic variation. These databases would be extremely valuable in efforts to conserve endangered species [26], by helping to detect and reduce illegal trade and informing conservation management plans.