The Complete Plastid Genome of Artocarpus camansi : A High Degree of Conservation of the Plastome Structure in the Family Moraceae

: Understanding the plastid genome is extremely important for the interpretation of the genetic mechanisms associated with essential physiological and metabolic functions, the identiﬁcation of possible marker regions for phylogenetic or phylogeographic analyses, and the elucidation of the modes through which natural selection operates in di ﬀ erent regions of this genome. In the present study, we assembled the plastid genome of Artocarpus camansi , compared its repetitive structures with Artocarpus heterophyllus , and searched for evidence of synteny within the family Moraceae. We also constructed a phylogeny based on 56 chloroplast genes to assess the relationships among three families of the order Rosales, that is, the Moraceae, Rhamnaceae, and Cannabaceae. The plastid genome of A. camansi has 160,096 bp, and presents the typical circular quadripartite structure of the Angiosperms, comprising a large single copy (LSC) of 88,745 bp and a small single copy (SSC) of 19,883 bp, separated by a pair of inverted repeat (IR) regions each with a length of 25,734 bp. The total GC content was 36.0%, which is very similar to Artocarpus heterophyllus (36.1%) and other moraceous species. A total of 23,068 codons and 80 SSRs were identiﬁed in the A . camansi plastid genome, with the majority of the SSRs being mononucleotide (70.0%). A total of 50 repeat structures were observed in the A. camansi plastid genome, in contrast with 61 repeats in A. heterophyllus . A purifying selection signal was found in 70 of the 79 protein-coding genes, indicating that they have all been highly conserved throughout the evolutionary history of the genus. The comparative analysis of the structural characteristics of the chloroplast among di ﬀ erent moraceous species found a high degree of similarity in the sequences, which indicates a highly conserved evolutionary model in these plastid genomes. The phylogenetic analysis also recovered a high degree of similarity between the chloroplast genes of A. camansi and A. heterophyllus , and reconﬁrmed the hypothesis of the intense conservation of the plastome in the family Moraceae.


Introduction
The chloroplast, which has an independent circular genome, is an essential organelle in higher plants and plays a crucial role in the processes of photosynthesis and carbon fixation [1,2]. The plastid genome (cpDNA) of the angiosperms is highly conserved in the structure, order, and composition of its genes in comparison with the nuclear and mitochondrial genomes [3,4]. This, together with its maternal phylogenetic analyses, and have been sequenced successfully in a number of different genera using target enrichment [23,30].
The goals of this study were to assemble the complete plastid genome of A. camansi from whole genome sequence data, compare its repetitive structures with those of A. heterophyllus, and verified the plastome structure and synteny among the members of the family Moraceae. We also constructed a plastid phylogenomic tree to explore the relationships among three families (Moraceae, Rhamnaceae and Cannabaceae) of the order Rosales.

Sampling, Genome Assembly, and Annotation
Illumina paired-end sequencing data of A. camansi were obtained from the NCBI Sequence Read Archive (accession no. SRR2910988). The plant sampling, library preparation, and parameters used for high throughput sequencing are available in Gardner et al. [29] The paired-end reads were assembled into a complete plastid genome using Fast-Plast pipeline v.1.2.8 [31] with the -subsample option defined as 45,000,000 and Rosales order as the bowtie_index. The assembly of the plastid genome was curated using the Bowtie2 software by aligning the sequence reads in the plastid [32]. The alignments were converted into binary BAM format, sorted, and indexed using the samtools platform [33]. The genome coverage was then estimated from the alignments in the BAM files and the genomeCoverageBed command in the BEDTools software [34].

Characterization of Repeat Sequences
The REPuter server was used to detect and locate forward, reverse, palindrome, and complementary repeat sequences with a minimum size of 30 bp, with a hamming distance of 3, and at least 90% identity [39]. Microsatellites, also known as SSRs (Simple Sequence Repeats), were detected using the MISA software v2.1 (available online at: http://pgrc.ipk-gatersleben.de/misa/ misa.html) [40,41]. The SSR search was based on the following parameters: 10 repeat units for the mono-nucleotides, 5 repeat units for the di-nucleotides, 4 repeat units for the tri-nucleotides, and 3 repeat units for the tetra-, penta-, and hexa-nucleotides. For the comparative analysis among genera, the repeat analysis focused on the plastid genome of A. camansi and A. heterophyllus, the phylogenetically closest plastid genome available in the NCBI.

Non-Synonymous (Ka) and Synonymous (Ks) Substitution Rate Analysis and Nucleotide Diversity Analysis
To estimate the non-synonymous (Ka) and synonymous (Ks) substitution rates, the 79 protein coding genes of A. camansi and A. heterophyllus were aligned separately using the MAFFT tool [42] available in the software Geneious 11.0.4 (Biomatters Ltd., Auckland, NZ). The non-synonymous (Ka) and synonymous (Ks) substitutions and the Ka/Ks ratio were then estimated for each gene using the software DnaSP v6.12.03 [43].
To assess the nucleotide diversity (Pi), the complete plastid genome sequences of A. camansi and A. heterophyllus were first aligned using the MAFFT aligner tool available in the Geneious software. A sliding window analysis was then run to calculate the Pi values using the software DnaSP v6.12.03 [43] with a window length of 600 bp and a step size of 200 bp.

Comparative Plastid Genome Analysis
The complete plastid genome of A. camansi was compared with the plastid genomes available for six other moraceous species using the mVISTA program in the Shuffle-LAGAN mode [44]. In this analysis, the recently assembled and annotated A. camansi plastid genome was used as the reference. The border positions of the LSC, SSC, and IR regions (LSC/IRB/SSC/IRA) were plotted and compared between A. camansi and the six species using IRscope [45].

Phylogenetic Analyses
Fifty-six protein coding genes were recorded in 34 species from three plant families (Moraceae, Rhamnaceae, and Cannabaceae) and two species (Glycine soja and Vigna unguiculata) included as outgroups. All the genes were obtained from NCBI GenBank except those of A. camansi (see Supplementary Table S1 for species and accession numbers). The sequences of all the genes were aligned and concatenated, and used to obtain the priors provided by the evolutionary model. This model was selected using the Bayesian Information Criterion (BIC), implemented in the JMODELTEST 2 software [46]. The GTR+G model (−lnL = 187133.0707, wBIC = 0.7612) was selected, with a gamma shape parameter equal to 0.2480.
The phylogenetic tree was assembled in MR BAYES v.3.2.7 [47], using Bayesian inference. The analysis was based on four independent runs of 10 × 10 6 generations, assigned to each chain, with the a posteriori probability distribution being determined every 500 generations. The first 2500 trees were discarded prior to the construction of the consensus tree, to ensure the convergence of the chains. The final tree was edited and visualized in FigTree v 1.4.4 [48].

Plastid Genome Assembly, Organization, and Features
A total of 36.9 Gbp of data with 182,485,953 Illumina paired-end raw reads was downloaded from the NCBI database and used to assemble the plastid genome of A. camansi. After filtering using the -subsample option, a total of 21,841,505 paired-end raw reads (mean length of 99 bp) were retained and the plastid genome was assembled successfully using Fast-Plast. The mean genome coverage of the alignments was 10,733X, with a standard deviation of 1683 (median = 10,849; minimum = 2047; maximum = 17,649; Supplementary Figure S1). The sequence of the chloroplast genome was deposited to GenBank under the accession number MW149075.
The plastid genome of A. camansi is 160,096 bp in length and presents the circular quadripartite structure typical of the angiosperms, which comprises a large single copy (LSC) region of 88,745 bp and a small single copy (SSC) region of 19,883 bp, separated by a pair of inverted repeat (IR) regions, each with a length of 25,734 bp ( Figure 1; Table 1). The plastid genome of A. camansi is similar in size to that of A. heterophyllus [49], and those of other moraceous species [12,50,51] (see also Table 1).
The total GC content (or guanine-cytosine content) was 36.0%, which is also very similar to that of A. heterophyllus (36.1%) and the plastomes of other moraceous species, whose overall GC content ranges from 35.6% to 36.4% (Table 1; see also Supplementary Table S1). The GC content of the IR regions (42.7%) was higher than that of the LSC (33.7%) and SSC (28.8%) regions (Supplementary Table S2). The high GC content of the IR region may be attributed to the presence of rRNA and tRNA genes with a relatively high GC content, which occupy the majority of this region [52,53].  The plastid genome of A. camansi was predicted to encode 113 different genes, with 79 proteincoding genes, 30 transfer RNA (tRNA), and four ribosomal RNA (rRNA) genes ( Figure 1; Table 2). Eighteen genes were duplicated completely in the IR regions, including seven protein-coding genes (ndhB, rpl2, rpl23, rps7, rsp12, ycf2, and ycf15), seven tRNAs (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), all four rRNAs (rrn4.5, rrn5, rrn16, and rrn23), and part The genes are color-coded according to their functional groups. The genes on the inside of the circle are transcribed in the clockwise direction, and those on the outside are transcribed counterclockwise. The inner circle shows the quadripartite structure of the chloroplast, that is, the small single copy (SSC), large single copy (LSC), and pair of inverted repeats (IRa and IRb). The darker and lighter gray shading in the inner circle correspond to guanine-cytosine (GC) and adenine-thymine (AT) content, respectively. The plastid genome of A. camansi was predicted to encode 113 different genes, with 79 protein-coding genes, 30 transfer RNA (tRNA), and four ribosomal RNA (rRNA) genes ( Figure 1; Table 2). Eighteen genes were duplicated completely in the IR regions, including seven protein-coding genes (ndhB, rpl2, rpl23, rps7, rsp12, ycf2, and ycf15), seven tRNAs (trnI-CAU, trnL-CAA, trnV-GAC, trnI-GAU, trnA-UGC, trnR-ACG, and trnN-GUU), all four rRNAs (rrn4.5, rrn5, rrn16, and rrn23), and part of the 5 end of ycf1. Of the remaining genes, the LSC region contains 22 tRNA and 61 protein-coding genes, while the SSC region consists of one tRNA gene and 10 protein-coding genes. Table 2. List of the genes found in the plastid genome of A. camansi.
The codon usage was analyzed using 79 protein-coding gene sequences, which have a strong (63%) AT bias (Supplementary Table S2). A total of 23,068 codons were identified in the A. camansi plastid genome (Table 3), with 64 codon types being identified, which encode all 20 amino acids. The most abundant amino acid was leucine, with 2523 codons, that is, approximately 10.94% of the total number of codons, whereas, excluding stop codons, cysteine was the least abundant one (270 codons, 1.17% of the total number). A prevalence of leucine and reduced quantities of cysteine were also observed by Somaratne et al. [57] in their evaluation of the plastid genome of species of the bush clover genus Lespedeza. The most abundant codon was ATT, which encodes isoleucine. Only one codon was identified for the tryptophan (TGG) and methionine (ATG) amino acids (Table 3). The GC content of the codon positions is the principal factor determining biased codon usage in many organisms. In the A. camansi, the mean GC content of the first codon position was 44.7%, while it was 37.1% for the second position, and 29.4%, for the third position (Supplementary Table S2). This indicates a strong bias toward A/T in the third codon position, which is consistent with the previous studies of plastid genomes [57,58]. The presence of translation-preferred codons demonstrates the importance of evolutionary processes in the plastid genome that result from both natural selection and mutation preferences [58]. This variation in the codon bias is highly similar to that found in other moraceous species [12,50], consistent with the fact that they are highly conserved.
The A/T mononucleotide repeats were the most abundant SSRs identified in the A. camansi and A. heterophyllus plastid genomes, accounting for 67.5% (54) and 66.2% (49) of the SSRs, respectively ( Figure 2B). As expected, since the higher plant plastid genomes are generally AT-rich, most of the di-, tri-, and tetra-nucleotides were AT motifs, showing a strong AT bias in the SSRs of the plastid genomes of these two Artocarpus species, which is consistent with the pattern observed in Morus mongolica, which has an even higher AT content (99.7%) of SSRs in its genome [12]. The SSRs were distributed in the LSC, SSC, and IR regions of A. camansi, but were more abundant in the LSC than in the SSC and IR regions ( Figure 2C). Overall, 52 of the 80 SSRs identified in the A. camansi plastid genome were located in intergenic regions (65.0%), while 15 were identified in the introns (18.8%), and 13 in coding regions (16.3%), including atpF, rpoB, atpB, ndhF, rps15, and ycf1. The ycf1 gene contained five SSRs, and the ndhF gene contained four ( Figure 2D). In A. heterophyllus, in comparison, 57 SSRs were located in intergenic regions (77.03%), 11 in introns (14.86%), and 6 in coding regions (8.11%), including rpoC2, rpoB, atpB, and ndhF ( Figure 2D). These differences are probably due to the fact that the ycf1 gene of A. heterophyllus does not encompass the IR-SSC boundary, as observed in A. camansi (see below). The ycf1 gene of A. camansi encodes a protein of 1867 amino acids, and the portion of the ycf1 located in the IR region is short and conserved, while that located in the SSC region is highly variable in most land plants [61,62]. This region has also been shown to be more variable than matK in many taxa and would thus be suitable for molecular systematics at low taxonomic levels, and for DNA barcoding [61][62][63].
The repetitive structure of the plastid genome may promote rearrangement and increase the genetic diversity of plant populations [64,65]. A total of 50 tandem repeat structures were identified in the A. camansi plastid genome (Forward = 21; Palindromic = 29; Figure 3A), whereas 61 were identified in the plastid genome of A. heterophyllus (Forward = 32; Palindromic = 27; Reverse = 2; Figure 3B). No complementary repeats were found in both plastid genomes whereas reverse repeats were detected only in A. heterophyllus ( Figure 3B). The repeats were also larger in A heterophyllus (30-124 bp) in comparison with A. camansi (30-69 bp). The LSC region contained more repeats than the IR and SSC regions in both species ( Figure 3C). Moreover, a large number of tandem repeat structures were found in intergenic regions and introns, while only seven repeats were identified in the coding regions of A. heterophyllus, and nine in that of A. camansi, in the ycf2, psaB, trnG-UCC, and trnS-UGA sequences of both species ( Figure 3D). The largest number of repeats were located in the ycf2 gene (four repeats in both A. camansi and A. heterophyllus). This can be attributed to the length of the ycf2 sequence (6888 bp in A. camansi and 6846 bp in A. heterophyllus). This is consistent with the pattern observed in Carya illinoinensis [52], Citrus sinensis [66], Nasturtium officinale [67], and Toxicodendron vernicifluum [65]. The repetitive structure of the plastid genome may promote rearrangement and increase the genetic diversity of plant populations [64,65]. A total of 50 tandem repeat structures were identified in the A. camansi plastid genome (Forward = 21; Palindromic = 29; Figure 3A), whereas 61 were identified in the plastid genome of A. heterophyllus (Forward = 32; Palindromic = 27; Reverse = 2; Figure 3B). No complementary repeats were found in both plastid genomes whereas reverse repeats were detected only in A. heterophyllus ( Figure 3B). The repeats were also larger in A heterophyllus (30-124 bp) in comparison with A. camansi (30-69 bp). The LSC region contained more repeats than the regions of A. heterophyllus, and nine in that of A. camansi, in the ycf2, psaB, trnG-UCC, and trnS-UGA sequences of both species ( Figure 3D). The largest number of repeats were located in the ycf2 gene (four repeats in both A. camansi and A. heterophyllus). This can be attributed to the length of the ycf2 sequence (6888 bp in A. camansi and 6846 bp in A. heterophyllus). This is consistent with the pattern observed in Carya illinoinensis [52], Citrus sinensis [66], Nasturtium officinale [67], and Toxicodendron vernicifluum [65].

The Ka/Ks Ratio and Nucleotide Diversity
In coding regions, substitutions can be either synonymous (Ks) or non-synonymous (Ka). Synonymous substitutions, or silent mutations, do not alter the amino acid composition of the encoded protein, whereas non-synonymous substitutions do modify this composition. The rps19 gene, associated with the small subunit of ribosomal proteins, had the highest non-synonymous rate (0.0334), while the psbT gene, which is related to the core complex of photosystem II (PSII), had the highest synonymous rate of 0.0864 (Supplementary Table S4).
The ratio of the non-synonymous to the synonymous rate (Ka/Ks) was determined for the 79 protein-coding genes common to the plastid genomes of A. camansi and A. heterophyllus (Figure 4 and Supplementary Table S4). The Ka/Ks ratios of the two Artocarpus species ranged from 0.000 to 2.299 (mean = 0.317). The lowest Ka/Ks ratios were observed in genes that encode subunits of photosystem I (mean = 0.010) and photosystem II (0.038), the large subunit of the RuBisCO (0.033), and the subunits of the ATP synthase (0.094).

The Ka/Ks Ratio and Nucleotide Diversity
In coding regions, substitutions can be either synonymous (Ks) or non-synonymous (Ka). Synonymous substitutions, or silent mutations, do not alter the amino acid composition of the encoded protein, whereas non-synonymous substitutions do modify this composition. The rps19 gene, associated with the small subunit of ribosomal proteins, had the highest non-synonymous rate (0.0334), while the psbT gene, which is related to the core complex of photosystem II (PSII), had the highest synonymous rate of 0.0864 (Supplementary Table S4).
The ratio of the non-synonymous to the synonymous rate (Ka/Ks) was determined for the 79 protein-coding genes common to the plastid genomes of A. camansi and A. heterophyllus (Figure 4 and Supplementary Table S4). The Ka/Ks ratios of the two Artocarpus species ranged from 0.000 to 2.299 (mean = 0.317). The lowest Ka/Ks ratios were observed in genes that encode subunits of photosystem I (mean = 0.010) and photosystem II (0.038), the large subunit of the RuBisCO (0.033), and the subunits of the ATP synthase (0.094). The Ka/Ks ratio reflects the type of selective pressure affecting a certain protein-coding gene. A Ka/Ks ratio higher than 1 indicates a positive selection, while a Ka/Ks ratio of less than 1 indicates a negative (purifying) selection. A ratio around 1 indicates either neutral evolution or an averaging of sites under positive and negative selective pressures [68,69]. Here, a Ka/Ks value of 0 was recorded for 39 genes, of which 3 (psaC, ndhE, and rpl32) are located in the SSC region, 4 in the IR region (rpl23, ndhB, rps7, and rps12), and 32 in the LSC region ( Figure 4). Values so low, which occur when Ka/Ks = 0, indicate an extremely strong purifying selection occurring in these genes.
The Ka/Ks ratios were below 1 for 70 of the 79 protein-coding genes, indicating that purifying selection was acting on these genes, and that they were conserved intensely during the evolutionary history of the genus. The Ka/Ks ratios indicate positive selection in nine of the genes analyzed. These genes are associated with the large subunit of ribosomal proteins (rpl33 and rpl20), the small subunit of ribosomal proteins (rps3 and rps19), RNA polymerase subunits (rpoA and rpoC2), unknown function (ycf2 and ycf4), and the clpP, which functions as an envelope membrane protein.
The estimated mean nucleotide diversity (Pi) between A. camansi and A. heterophyllus was 0.00753, with specific values ranging from 0.00 to 0.04. In the LSC region, the mean nucleotide diversity was 0.00877 (range: 0-0.03667), while in the SSC region, it was 0.01239 (0.00167-0.04000), and in the IR region, the mean diversity was 0.00346 (0.00000-0.02000). These values reflect negligible differences between the two plastid genomes, in particular in the IR region. A low level of sequence divergence (Pi = 0.00432) was also found among five Morus species [70]. Similar results were obtained The Ka/Ks ratio reflects the type of selective pressure affecting a certain protein-coding gene. A Ka/Ks ratio higher than 1 indicates a positive selection, while a Ka/Ks ratio of less than 1 indicates a negative (purifying) selection. A ratio around 1 indicates either neutral evolution or an averaging of sites under positive and negative selective pressures [68,69]. Here, a Ka/Ks value of 0 was recorded for 39 genes, of which 3 (psaC, ndhE, and rpl32) are located in the SSC region, 4 in the IR region (rpl23, ndhB, rps7, and rps12), and 32 in the LSC region ( Figure 4). Values so low, which occur when Ka/Ks = 0, indicate an extremely strong purifying selection occurring in these genes.
The Ka/Ks ratios were below 1 for 70 of the 79 protein-coding genes, indicating that purifying selection was acting on these genes, and that they were conserved intensely during the evolutionary history of the genus. The Ka/Ks ratios indicate positive selection in nine of the genes analyzed. These genes are associated with the large subunit of ribosomal proteins (rpl33 and rpl20), the small subunit of ribosomal proteins (rps3 and rps19), RNA polymerase subunits (rpoA and rpoC2), unknown function (ycf2 and ycf4), and the clpP, which functions as an envelope membrane protein.
The estimated mean nucleotide diversity (Pi) between A. camansi and A. heterophyllus was 0.00753, with specific values ranging from 0.00 to 0.04. In the LSC region, the mean nucleotide diversity was 0.00877 (range: 0-0.03667), while in the SSC region, it was 0.01239 (0.00167-0.04000), and in the IR region, the mean diversity was 0.00346 (0.00000-0.02000). These values reflect negligible differences between the two plastid genomes, in particular in the IR region. A low level of sequence divergence (Pi = 0.00432) was also found among five Morus species [70]. Similar results were obtained in the comparison of the plastid genome sequences of the sister species, Alium macranthum and Alium fasciculatum, with a mean nucleotide diversity of 0.00609 in the IR region, in contrast with 0.01060 in the LSC and 0.01735 in the SSC region [71]. Kong et al. [72] also found that the IR region was the most conserved in the genomes of 14 Aconitum species, with a Pi value of 0.001079 in comparison with 0.007140 in the LSC region, and 0.008368 in the SSC region.
Five regions (trnH-GUG-psbA, trnG-UCC-trnR-UCU, trnT-UGU-trnL-UAA, psbE-petL, and rpl32-trnL-UAG) were highly variable, with Pi values of over 0.03 ( Figure 5). The first four of these loci are located in the intergenic spacer of the LSC region, whereas the latter is located in the SSC region. The intergenic region rpl32-trnL that has the highest nucleotide diversity (Pi = 0.04) in the present study has also been found to be highly variable in the Machilus, Morus, and Solanum plastid genomes, and in those of the spermatophyte species [61,62,70,73,74]. Given their previous designation as hotspots, two intergenic spacers appear to be universal in the family Moraceae. The trnT-trnL were found to be highly variable between two species of the genus Morus, as was psbE-petL in all five Morus plastomes [12,70]. The trnH-psbA spacer of Artocarpus was also highly variable (Pi = 0.03667). This region has been widely used as a plant DNA barcode and, when combined with ITS2, it was significantly more efficient in the identification of taxa than matK+rbcL in 18 families and 21 genera, indicating that it may be an optimal marker for species identification [75]. Overall, then, all these regions may be especially valuable for further phylogenetic analysis of the genus, and have good potential for use as barcode markers.
in the comparison of the plastid genome sequences of the sister species, Alium macranthum and Alium fasciculatum, with a mean nucleotide diversity of 0.00609 in the IR region, in contrast with 0.01060 in the LSC and 0.01735 in the SSC region [71]. Kong et al. [72] also found that the IR region was the most conserved in the genomes of 14 Aconitum species, with a Pi value of 0.001079 in comparison with 0.007140 in the LSC region, and 0.008368 in the SSC region.
Five regions (trnH-GUG-psbA, trnG-UCC-trnR-UCU, trnT-UGU-trnL-UAA, psbE-petL, and rpl32-trnL-UAG) were highly variable, with Pi values of over 0.03 ( Figure 5). The first four of these loci are located in the intergenic spacer of the LSC region, whereas the latter is located in the SSC region. The intergenic region rpl32-trnL that has the highest nucleotide diversity (Pi = 0.04) in the present study has also been found to be highly variable in the Machilus, Morus, and Solanum plastid genomes, and in those of the spermatophyte species [61,62,70,73,74]. Given their previous designation as hotspots, two intergenic spacers appear to be universal in the family Moraceae. The trnT-trnL were found to be highly variable between two species of the genus Morus, as was psbE-petL in all five Morus plastomes [12,70]. The trnH-psbA spacer of Artocarpus was also highly variable (Pi = 0.03667). This region has been widely used as a plant DNA barcode and, when combined with ITS2, it was significantly more efficient in the identification of taxa than matK+rbcL in 18 families and 21 genera, indicating that it may be an optimal marker for species identification [75]. Overall, then, all these regions may be especially valuable for further phylogenetic analysis of the genus, and have good potential for use as barcode markers.

Comparative Plastid Genome Structure
The comparative analysis of structural characteristics of the chloroplast among seven moraceous species revealed a high level of sequence similarity, indicating a highly conserved evolutionary model for these plastid genomes ( Figure 6). The analyses also demonstrated clearly that the IR regions are more conserved than the SSC and LSC regions, which may be due to the copy correction of the IR regions by gene conversion [76]. The most variable loci in the seven chloroplast genomes compared here were found in the matK, rpoC2, rps19, ndhF, and ycf1 genes. These genes were also found to be highly divergent in other plastid genomes [61,67,77,78], and may thus constitute potentially useful

Comparative Plastid Genome Structure
The comparative analysis of structural characteristics of the chloroplast among seven moraceous species revealed a high level of sequence similarity, indicating a highly conserved evolutionary model for these plastid genomes ( Figure 6). The analyses also demonstrated clearly that the IR regions are more conserved than the SSC and LSC regions, which may be due to the copy correction of the IR regions by gene conversion [76]. The most variable loci in the seven chloroplast genomes compared here were found in the matK, rpoC2, rps19, ndhF, and ycf1 genes. These genes were also found to be highly divergent in other plastid genomes [61,67,77,78], and may thus constitute potentially useful markers for phylogenetic analyses at different taxonomic levels. In the case of the noncoding regions, a high level of variation was observed in the intergenic spacers, including trnH-GUG-psbA, matK-rps16, rps16-psbK, trnG-UCC-trnR-UCU, trnT-UAA-trnL-GAA, trnL-UAA-trnF-GAA, trnF-GAA-ndhJ, accD-psaI, psbE-petL, and rpl32-trnL-UAG. Some of these noncoding regions have the highest levels of observed nucleotide diversity.
Forests 2020, 11, x; doi: FOR PEER REVIEW 13 of 19 markers for phylogenetic analyses at different taxonomic levels. In the case of the noncoding regions, a high level of variation was observed in the intergenic spacers, including trnH-GUG-psbA, matK-rps16, rps16-psbK, trnG-UCC-trnR-UCU, trnT-UAA-trnL-GAA, trnL-UAA-trnF-GAA, trnF-GAA-ndhJ, accD-psaI, psbE-petL, and rpl32-trnL-UAG. Some of these noncoding regions have the highest levels of observed nucleotide diversity. The contraction and expansion of the IR region and the SSC boundaries can be considered to be the primary mechanism of variation in the length of the plastid genomes of higher plants [79]. The length of the IR regions was highly similar in the seven moraceous species analyzed here however, ranging from 25,678 bp in F. racemosa, M. indica, and M. mongolica to 25,902 in F. carica (Figure 7). In most species, the rps19 gene is located to the left of the LSC-IRb boundary (JLB), and rpl2 is located to the right. The exception was F. carica, in which the LSC-IRb boundary was located within the rps19 sequence, and had a length of 108 bp, located in the IRb (Figure 7). The IRb-SSC boundary (JSB) is located within the ndhF gene, so that from 13 bp (in F. carica) to 42 bp (in A. camansi) of its coding sequence is in the IRb region. In A. heterophyllus and F. carica the ycf1 is also located within the IRb/SSC region, with a length of 4 bp and 84 bp, respectively (Figure 7). The SSC-IRa boundaries (JSA) were located within the ycf1 pseudogene, with the fragment located in the IRa region ranging from 983 bp in A. camansi to 1079 bp in F. racemosa (Figure 7). In A. heterophyllus, the ycf1 pseudogene embedded in the SSC-IRa was unnotated. The IRa-LSC (JLA) boundary is located upstream from the rpl2 and downstream from the trnH. The trnH gene was unannotated in F. racemosa. The present study is the first to compare the IR boundaries in species of the family Moraceae, in order to better evaluate the The contraction and expansion of the IR region and the SSC boundaries can be considered to be the primary mechanism of variation in the length of the plastid genomes of higher plants [79]. The length of the IR regions was highly similar in the seven moraceous species analyzed here however, ranging from 25,678 bp in F. racemosa, M. indica, and M. mongolica to 25,902 in F. carica (Figure 7). In most species, the rps19 gene is located to the left of the LSC-IRb boundary (JLB), and rpl2 is located to the right. The exception was F. carica, in which the LSC-IRb boundary was located within the rps19 sequence, and had a length of 108 bp, located in the IRb (Figure 7). The IRb-SSC boundary (JSB) is located within the ndhF gene, so that from 13 bp (in F. carica) to 42 bp (in A. camansi) of its coding sequence is in the IRb region. In A. heterophyllus and F. carica the ycf1 is also located within the IRb/SSC region, with a length of 4 bp and 84 bp, respectively (Figure 7). The SSC-IRa boundaries (JSA) were located within the ycf1 pseudogene, with the fragment located in the IRa region ranging from 983 bp in A. camansi to 1079 bp in F. racemosa (Figure 7). In A. heterophyllus, the ycf1 pseudogene embedded in the SSC-IRa was unnotated. The IRa-LSC (JLA) boundary is located upstream from the rpl2 and downstream from the trnH. The trnH gene was unannotated in F. racemosa. The present study is the first to compare the IR boundaries in species of the family Moraceae, in order to better evaluate the evolution of the plastome. The analysis demonstrated clearly that the IR and the size genome are highly conserved in the study species.

Phylogenetic Analyses
The phylogeny recovered from the sequencing of the 56 protein coding genes in 33 species of the families Moraceae, Rhamnaceae, and Cannabaceae ( Figure 8) confirms the hypothesis of a high level of plastome conservation in the Moraceae. The families were arranged in three well-defined clusters, and based on the analysis, the Moraceae is the sister group to the Cannabaceae, and the Rhamnaceae is, in turn, sister to that clade. This arrangement also reinforces the genetic similarity of the plastid genomes of A. camansi and A. heterophyllus.

Phylogenetic Analyses
The phylogeny recovered from the sequencing of the 56 protein coding genes in 33 species of the families Moraceae, Rhamnaceae, and Cannabaceae ( Figure 8) confirms the hypothesis of a high level of plastome conservation in the Moraceae. The families were arranged in three well-defined clusters, and based on the analysis, the Moraceae is the sister group to the Cannabaceae, and the Rhamnaceae is, in turn, sister to that clade. This arrangement also reinforces the genetic similarity of the plastid genomes of A. camansi and A. heterophyllus. evolution of the plastome. The analysis demonstrated clearly that the IR and the size genome are highly conserved in the study species.

Phylogenetic Analyses
The phylogeny recovered from the sequencing of the 56 protein coding genes in 33 species of the families Moraceae, Rhamnaceae, and Cannabaceae ( Figure 8) confirms the hypothesis of a high level of plastome conservation in the Moraceae. The families were arranged in three well-defined clusters, and based on the analysis, the Moraceae is the sister group to the Cannabaceae, and the Rhamnaceae is, in turn, sister to that clade. This arrangement also reinforces the genetic similarity of the plastid genomes of A. camansi and A. heterophyllus. Previous studies have confirmed the phylogenetic proximity of Artocarpus and Morus. In a phylogeny based on the ndhF chloroplast gene, Datwyler and Weiblen [80] recovered a cluster that included A. camansi, A. heterophyllus, A. altilis, A. vriesiana, M. nigra, and M. alba. Zerega et al. [25] analyzed nuclear ITS sequences and chloroplast sequences from the trnLF region, and also confirmed the genetic proximity of Artocarpus to M. alba, with these species diverging from Humulus lupulus and Cannabis sativa. The complete sequencing of the A. camansi genome is consistent with the inference that Artocarpus separated from Morus through an event of the total duplication of the genome in the tribe Artocarpeae [29].

Conclusions
The present study describes the plastid genome of A. camansi, and confirms a highly conserved structure of the plastome in the family Moraceae. In particular, the study corroborates the hypothesis of the intense conservation of plastid genomes during the evolutionary of these plants, supporting the understanding of the genomic features of the chloroplast genes that are common to the different moraceous species. The chloroplast microsatellites, and the most diverse coding and noncoding regions identified in the present study may also provide valuable molecular markers for further research into the evolution of the Moraceae.