Identification of Leishmania infantum Epidemiology, Drug Resistance and Pathogenicity Biomarkers with Nanopore Sequencing

The emergence of drug-resistant strains of the parasite Leishmania infantum infecting dogs and humans represents an increasing threat. L. infantum genomes are complex and unstable with extensive structural variations, ranging from aneuploidies to multiple copy number variations (CNVs). These CNVs have recently been validated as biomarkers of Leishmania concerning virulence, tissue tropism, and drug resistance. As a proof-of-concept to develop a novel diagnosis platform (LeishGenApp), four L. infantum samples from humans and dogs were nanopore sequenced. Samples were epidemiologically typed within the Mediterranean L. infantum group, identifying members of the JCP5 and non-JCP5 subgroups, using the conserved region (CR) of the maxicircle kinetoplast. Aneuploidies were frequent and heterogenous between samples, yet only chromosome 31 tetrasomy was common between all the samples. A high frequency of aneuploidies was observed for samples with long passage history (MHOM/TN/80/IPT-1), whereas fewer were detected for samples maintained in vivo (MCRI/ES/2006/CATB033). Twenty-two genes were studied to generate a genetic pharmacoresistance profile against miltefosine, allopurinol, trivalent antimonials, amphotericin, and paromomycin. MHOM/TN/80/IPT-1 and MCRI/ES/2006/CATB033 displayed a genetic profile with potential resistance against miltefosine and allopurinol. Meanwhile, MHOM/ES/2016/CATB101 and LCAN/ES/2020/CATB102 were identified as potentially resistant against paromomycin. All four samples displayed a genetic profile for resistance against trivalent antimonials. Overall, this proof-of-concept revealed the potential of nanopore sequencing and LeishGenApp for the determination of epidemiological, drug resistance, and pathogenicity biomarkers in L. infantum.


Introduction
Leishmania infantum is a parasitic protozoan of the order Trypanosomatida (family Trypanosomatidae), known to infect and cause severe disease in dogs and other mammals which can act, in turn, as reservoirs for zoonotic transmission to humans (further developed in Hong et al., 2020) [1]. L. infantum is transmitted as an extracellular parasite (promastigote) to a mammalian host by female infected phlebotomine sand flies. Afterward, the parasites proliferate as obligate intracellular parasites (amastigotes) in phagocytic cells and spread to different organs, causing leishmaniosis. The infection in the mammalian host can occur subclinically or with clinical signs of varying severity. In humans, L. infantum infection can

DNA Extraction and Sequencing
DNA was extracted from the three cultures using the ZymoBIOMICS DNA Miniprep Kit (D4300; Zymo Research Corporation; Los Angeles, CA, USA), following the manufacturer's recommendations. Before library preparation, extracted DNA was quantified using Qubit dsDNA BR Assay Kit (Fisher Scientific S.L; Madrid, Spain), and quality was assessed by absorbance using a NanoDrop 2000 spectrophotometer (ThermoFisher Scientific S.L; Waltham, MA, USA). The sequencing libraries were prepared using the Rapid Barcoding Sequencing Kit (SQKRBK004) from Oxford Nanopore Technologies (ONT, Oxford, UK), following the manufacturer's recommendations. Rapid sequencing barcodes were added to the tagged ends to analyze more than one sample in a single run (barcoding). As a final step, 12 µL of each library was loaded onto a flow cell for sequencing using the Mk1c (MinION, ONT, Oxford, UK) and run for 48 h.

Aneuploidy and Gene Copy Number Variation Analysis
Aneuploidy and gene copy number variation were determined by calculating the log 2 change of observed vs. expected copies (2N) over a sliding window of 100 kbp and 1 kbp, respectively. Thus, the copy ratio of 0 is 2N, 0.5 is 3N, 1 is 4N, or −1 is N. Both analyses were conducted using the LeishGenApp analysis platform (Nano1Health S.L., Bellaterra, Spain). Briefly, direct uncorrected nanopore reads were mapped against a reference using LRA v1.3.4 [25] and the read alignments were processed by SAMTools v1.10 [19]. The coverage in read alignments was screened with multiple sliding windows (as previously described) with CNVkit v0.9.8 [26].
Chromosome variation was significant by a threshold of ±0.2, as recommended elsewhere [26]. For instance, all the copy ratio values between −0.2 and 0.2 were determined as diploid (2N). Locally, nine genomic regions harboring 22 genes previously described as related to pathogenicity or drug resistance in L infantum have been used as a proof-ofconcept for CNV characterization (Table 2). L. infantum JPCM5 v2/2018 genome assembly and annotation were used as reference for CNV (http://leish-esp.cbm.uam.es/l_infantum_ downloads.html, accessed during 1 April 2022) [18]. Gene identification is conserved with previous assemblies, as described elsewhere [18]. A more stringent threshold, ±0.25, was assigned to identify variation in gene copy number than for aneuploidy detection, accounting for a greater effect of population mosaicism of smaller sliding windows. If multiple copy numbers are reported due to mosaicism, the most extreme copy number is reported by convention. Gene copy number was calculated from log 2 for each of the 22 genes. CNVs were expressed as an addition to the expected number of chromosomes or gene copies (2N). Table 2. Summary of known biomarkers for drug resistance in L. infantum. Summary of locus, gene ID, function, and reported gene copy number variation of the nine genomic regions (22 loci) analyzed related to pathogenicity or drug resistance in L. infantum. Gene ID, Start, and End correspond to the L. infantum reference genome (L. infantum JPCM5 v2/2018) [18]
Each CR sequence could be successfully reconstructed with a variety of mean coverages: 161X for MHOM/TN/80/IPT-1, 15X for MHOM/ES/2016/CATB101, 8X for LCAN/ES/2020/CATB102, and 3X for MCRI/ES/2006/CATB033. Sequence alignment and a maximum likelihood (ML) tree of the CR of the maxicircle with the Leishmania-Trypanosoma phylogeny procured by Solana et al., 2022 placed them within the phylogenetic cluster of L. infantum ( Figure S1). Moreover, as shown in Figure 1, when placing the four sequences (marked in red) in an L. donovani-L. infantum phylogeny, the novel sequences still cluster within L. infantum but are divided between the so-called JPC5 subgroup (marked in dark green) and non-JPC5 subgroup (marked in blue) clusters. MHOM/ES/2016/CATB101 is the only sequence that clusters in the alternative non-JPC5 subgroup Spanish cluster. Thus, shallow coverage (<10X) does not interfere with sequence reconstruction nor with its phylogenetic placement.

Aneuploidy Analysis
Regarding the aneuploidy analysis, the chromosomal dotation of the samples was heterogeneous ( Figure 2). Chromosome 31 was tetrasomic in all the studied samples. Total or mosaic trisomy was observed for chromosomes 5,9,11,20,21,24,25,26 (Table 3). Hence, nine, six, four, and three genes remain copyneutral (CN 0) for each sample. Variation was observed for all the genes in at least one sample. Overall, the variation in copy number compared to the diploid dotation ranged from +7 copies for the LinJ.

CNV for Drug Resistance and Pathogenicity Biomarkers in L. infantum
Miltefosine, allopurinol, trivalent antimonials, amphotericin, and paromomycin are the most used drugs used to treat leishmaniosis. As depicted in Table 2 (Table 3). Hence, nine, six, four, and three genes remain copy-neutral (CN 0) for each sample. Variation was observed for all the genes in at least one sample. Overall, the variation in copy number compared to the diploid dotation ranged from +7 copies for the LinJ.   [30,31,33]. The LACK antigen is a protein encoded by Lack1 (LinJ.28.2940) and Lack2 (LinJ.28.2970) genes, and they are related to Leishmania pathogenicity [29]. Both are located at chromosome 28. Direct uncorrected nanopore reads proved to be of sufficient quality and length to successfully assign at least 94% (max. 96%) of the total reads to L. infantum with e-value < 1× 10 −8 , further validating the technology as a feasible approach for pathogen identification. The remaining reads were assigned to members of the L. donovani-L. infantum complex (5%) or were unassigned or misassigned to other Trypanosomatidae members (1%). Differentiation between L. infantum and L. donovani can be difficult due to their high nucleotide identity (i.e., chromosome 36 is 99.16% similar between references L. donovani BPK282A1 and L. infantum JPCM5); thus, it is not striking that a small percentage of reads (5%) is assigned as L. donovani. The remaining 1% consists of a mixture of assignments to other Trypanosomatidae, to high error reads or sequences belonging to the maxicircle and minicircle kinetoplasts, for which there is no complete reference for L. infantum yet [12,21].
In addition to species identification by random sequence fragments, maxicircle kinetoplast sequences from Trypanosomatidae have been previously described as a suitable molecular marker for species [11,21] and strain [12] phylogenies, akin to mitochondrial sequences from the kingdoms Plantae, Fungi, or Animalia. As shown in Figure S1, reconstruction of the L. infantum conserved region (CR) of the maxicircle is possible through guided consensus assembly with nanopore sequencing reads, with as little coverage as with a mean of 3X. Such new CR sequences are similar in quality to those previously published from L. infantum, as they cluster together in a Leishmania-Trypanosoma phylogeny. Moreover, as shown in

Detection of Aneuploidy and Gene Copy Number Variation
Leishmania parasites mostly rely on aneuploidy and DNA CNVs to regulate the expression of stress response genes to, e.g., temperature, acidity, or drugs [7,34,35]. As presented in the Results section, direct uncorrected nanopore reads have been suitable to detect aneuploidy, including chromosome mosaicism, and previously described CNV related to genetic drug resistance biomarkers.
Notably, local CNVs of targeted chromosome regions were more common across chromosomes in all four samples than aneuploidy. These genome plasticity strategies are a good solution for transcript regulation for an organism that lacks promoter-dependent regulation [6]. However, with this approach, it is not possible to discern whether this local gene amplification is intrachromosomal, maintained as extrachromosomal (linear or circular) molecules [34], or an expansion of an entire chromosome. Therefore, further studies to conclude its physical conformation should be conducted.
Remarkably, when comparing the ploidy of the four samples, MCRI/ES/2006/CATB033 has the smallest number of aneuploidies. Furthermore, it is the only sample where gene loss was quantified at the chromosomal level (sample is mosaic for a monoploidy of chromosome 13). It is noteworthy that MCRI/ES/2006/CATB033, despite being isolated from a dog, was propagated in hamsters (Cricetus aureus), in vivo, in contrast with the other three samples, which were only maintained as in vitro cultured promastigotes. MHOM/TN/80/IPT-1 had the largest number of aneuploidies (n = 14), probably linked to its long propagation and culture history, as it was isolated in 1980. Such long culture history may have adapted that strain to culturing conditions, as it is known that parasite isolation and subsequent in vitro parasite maintenance are strong drivers for chromosome and gene copy number variation [36,37]. Moreover, according to Domagalska et al., aneuploidy was much lower in amastigotes (intracellular stage of the parasite) than in cultivated promastigotes (extracellular) [37]. Our results are consistent with previous studies, given that the closest diploid karyotype was found in the in vivo MCRI/ES/2006/CATB033. Otherwise, MHOM/TN/80/IPT-1 showed the highest number of aneuploidies due to the in vitro effect on ploidy variation [37]. Considering changes in chromosome copy number as a highly common feature during experimental selection, the results support the importance of minimizing culture laboratory passaging or direct clinical samples when studying aneuploidy and CNV as genomic biomarkers.
Local detection of CNVs in the 22 genes studied revealed possible pharmacoresistance in our sample collection. A deletion in LdMT and/or ldRos3 (CN −1) is related with a 2-and 1.6-fold decrease in miltefosine sensitivity, respectively [32]. According to this genetic biomarker, samples MHOM/TN/80/IPT-1 and MCRI/ES/2006/CATB033 have the genetic potential to be pharmacoresistant to miltefosine. Additionally, the same samples have the genetic potential to have allopurinol pharmacoresistance as a deletion (CN −1) in the METK1 gene was quantified [10]. Regarding biomarkers for trivalent antimonial resistance, all four strains showed genetic potential for pharmacoresistance since an additional copy (CN +1) of the MRPA gene in the H locus is related to an increase in resistance [28]. MRPA CNVs ranged from +1 in MCRI/ES/2006/CATB033 to CN +5 in MHOM/TN/80/IPT-1. Finally, an extra copy of D-LDH and B-CAT genes is linked with a 4.87 and 4.08-fold increase in paromomycin resistance, respectively [30]. Potential genetic pharmacoresistance could be detected in MHOM/ES/2016/CATB101 and in LCAN/ES/2020/CATB102 since mosaic expansions (CN 0, +1) were found in B-CAT in both strains and D-LDH in the latter. A larger cohort of samples and phenotypic data (resistance) are required to validate the clinical relevance of these genetic pharmacoresistance profiles. A current limitation of this methodology is the elucidation of the physical conformation of aneuploidies and gene CNs (chromosomal or extrachromosomal). Despite not being relevant for possible genetic pharmacoresistance or virulence, this difference is relevant for the transmission of virulence, as described for Leishmania spp. Small extrachromosomal circles with virulence genes can be readily transmitted through vesicle transport to neighboring parasites [34]. Thus, an exhaustive analysis should be carried out in further studies to overcome this limitation.

Conclusions
Direct uncorrected nanopore reads were obtained from four samples of Leishmania infantum (MHOM/TN/80/IPT-1, MHOM/ES/2016/CATB101, LCAN/ES/2020/CATB102, and MCRI/ES/2006/CATB033). Those reads were successfully used to (i) identify the species in culture, (ii) type the parasite's group, and (iii) identify potential biomarkers of pharmacoresistance or virulence, reducing the computational cost and time in comparison to other strategies (i.e., assembly). Additionally, the longer read lengths than other sequencing alternatives (i.e., Illumina sequencing) permitted a reconstruction of the conserved region of the maxicircle sequences with as little as 3X coverage. Likewise, direct uncorrected nanopore reads provided sufficient coverage to identify chromosomal aneuploidies. These findings supported the presence of intrastrain mosaicism and interstrain diversity in our samples. Moreover, this methodology was able to determine the CNV status of 22 genes (divided in 10 loci) related to pharmacoresistance and virulence with shallow coverage (>5X). The analysis of additional strains with available phenotype data will be needed to validate LeishGenApp and the methodology presented here. Furthermore, the analyses of aneuploidy and CNV directly from clinical samples, coupled with in vitro drug resistance and pathogenicity tests, would help decipher the selected regions' suitability as biomarkers for L. infantum.