Tracing the Geographic Origin of Merbau ( Intsia palembanica Miq.) in Century Old Planting Trials

: Our study highlights the utilization of a genetic database for wood-origin identiﬁcation in Intsia palembanica , a valuable heavy hardwood from the tropical forests. This forensic tool is essential for strengthening the veriﬁcation of legality in the wood supply chain from the forest to the end-users. An increasing number of rules and regulations are being put in place to promote sustainable practice in the timber trade, one of which involves ensuring that importers declare the correct species name and source of geographic origin of the timber. We aimed to determine the origin of the I. palembanica seed source used in the early establishment on the Forest Research Institute Malaysia (FRIM) campus. DNA samples of I. palembanica individuals from the FRIM campus were obtained and analyzed using four chloroplast (cp) DNA markers to characterize the haplotype variants for population identiﬁcation. In addition, the DNA samples were also genotyped at 14 short tandem repeat (STR) loci for individual identiﬁcation. Individual assignment to the possible geographic origin was done through an assignment test. On the basis of our recently developed I. palembanica genetic databases, the I. palembanica seed source for the early establishment was inferred to be originated from a mixture of several sources, with a large portion from the southern region (89%) and a relatively small portion from the northern region (11%) of Peninsular Malaysia. The I. palembanica seed source used for the early establishment on the century old FRIM campus was inferred to be originated from several forest reserves located not far from the planting sites. This study proves the applicability of the DNA method in supply-chain veriﬁcation, where an unknown I. palembanica tree can be traced to its geographic origin using genetic databases. amplifications performed in L volumes containing 10 Type-it Multiplex PCR Master forward Cycler (Thermo of


Introduction
World forests cover 4.06 billion hectares (ha) of area, equivalent to 31% of the global land area, and they are home to most of Earth's terrestrial biodiversity [1,2]. To date, an estimate of more than 60,000 tree species have been recorded, more than 13% of which have been categorized as globally threatened in the International Union for Conservation of Nature (IUCN) Red List, due in part to the deforestation and forest degradation happening at alarming rates [3,4]. An estimated 420 million ha of forest has been lost worldwide through deforestation since 1990, with the annual rate of deforestation estimated at 10 million ha in the most recent five-year period (2015-2020) [5]. In view of the threat to global biodiversity and forest ecosystem, an increasing number of laws were tabled by several countries to protect natural forests by promoting sustainable practice in the timber trade, ensuring that importers declare the species and geographic origin of the timber and, most importantly, their legal harvest. These laws include the Wild Animal and Plant Protection and Regulation of International and Interprovincial Trade Act in Canada (1992), the Lacey Act of the United States (2008), the European Union Timber Regulation (2010), the Australian Illegal Logging Prohibition Act (2012), and the Japanese Clean Wood Act (2017), which have been adopted in the respective consumer countries or regions. To support the implementation of these laws, various wood identification methods and timber tracking tools are currently available, such as wood anatomy, genetics, stable isotopes, and direct analysis in real time (DART) coupled with time-of-flight (TOF) mass spectrometry and near-infrared (NIR) spectroscopy [6]. Each of these techniques has its own strengths and limitations, and they complement each other in identifying the timber species and geographic origin of timber samples. The challenges of combining different timber tracking methods in wood identification are still being worked out, and various potential mitigation measures have been proposed by the forensic science communities [6].
Intsia palembanica, locally known as merbau, is among the most valuable tropical heavy hardwoods. It is distributed throughout Andaman Islands, Thailand, and Malesia eastward to western New Guinea in inland lowland forests, mainly in low-lying areas along rivers and up to 1000 m above sea level [7]. The timber is prized for its strength and durability, widely used for high-class general construction, outdoor furniture, decking, interior finishing, paneling, parquet flooring, veneer, and decorative and novelty items [8]. In the trade market, the prices for merbau timber are around USD $474 for logs (per cubic meter) and USD $1121 for sawn timber (per cubic meter) in general market specification (GMS) [9]. This is the second highest price when compared to Chengal wood, which is marketed at USD $1770 for sawn timber (per cubic meter). The international market demand for high-quality timber products is fueling the species toward a greater degree of exploitation, and concern has risen over the sustainability of harvests from natural populations. According to the IUCN's Red List of Threatened Species, merbau is classified as a vulnerable species across its range [4]. An investigation focused on the logging of merbau found that most large international flooring producers include this species in their product ranges; however, only a few of them are able to prove the legal origin of their merbau supply [10]. Nongovernmental organizations such as the Environmental Investigation Agency have reported a series of seizures of illegal merbau logs by enforcement officers in Indonesia, both on land and at sea, in an effort to curb the tide of illegal timber [11]. Therefore, tools verifying the origin of merbau timber are valuable in assisting the investigation process and providing evidence for subsequent testifying in court procedures.
Genetic markers such as chloroplast DNA (cpDNA) and short tandem repeats (STR) that enable species identification, the geographic origin traceability of wood, and individual tree identification have the potential to assist enforcement officers in the investigation of illegal logging cases [12]. By using genetic markers to identify an unknown sample, inferences are made on the basis of eco-evolutionary processes, such as mutation, migration, selection and adaptation, genetic drift, and speciation [6]. Plastid genomes evolve relatively slowly and, thus, cpDNA markers are ideal for distinguishing species and genera [13]. For population identification, cpDNA markers that are variable enough to reveal geographical structure are suitable for drawing conclusions about the geographic origin in timber species [14][15][16]. On the other hand, nuclear genomes evolve at varied rates. Fast-evolving loci such as STR markers are ideal for fine-scale tracing such as identifying individuals [17]. The usefulness of cpDNA and STR markers has been reported in species identification and in tracing the geographical origin of some economically important temperate coniferous tree species such as Pinus sylvestris, Picea abies, Abies alba, Larix decidua, Fraxinus excelsior, and Quercus robur [18][19][20][21][22], as well as tropical timber trees such as Neobalanocarpus heimii, Gonystylus bancanus, Shorea platyclados, and I. palembanica [14][15][16]23]. The DNA profiles generated for temperate tree species have been presented as strong proof in several court cases in Poland [19,22]. This proof is based on the high probability (approximately 98%-99%) provided in identity comparison of the piece of evidence (i.e., stolen wood) with a piece of reference (i.e., stump in the forest), thus supporting the decision taken by several district courts in Poland.
The Forest Research Institute Malaysia (FRIM) campus has an area of 544.3 ha. The campus ground was formerly an area stripped of its original forest cover and degraded by tin-mining activities during the 1920s [24]. Establishment of trial plantings of various timber species for the rehabilitation of the ex-tin-mining area started as early as 1926 [25]. The plantation trials covered about 100 indigenous and exotic species. Recently, a genetic database for an important tropical timber species, I. palembanica, was established using both cpDNA and STR markers [23]. By utilizing both the cpDNA haplotype and the STR database, we aimed to unravel the origins of the I. palembanica seed source used in the early establishment of the planting trials. This study presents the utilization of the database to infer the geographic origin of I. palembanica on the FRIM campus.

Sample Collection and DNA Extraction
The Forest Research Institute Malaysia (FRIM) campus ground (3.23722 • north (N), 101.63448 • east (E)) was transformed from a barren tin-mining area almost 100 years ago to a successful forest rehabilitation model with multiple rainforest trees species (Figure 1) [24]. According to the planting records available in the institute, the early establishment of I. palembanica trial plots was carried out in several different years from 1927 to 1950 [26]. A total of 70 individuals of I. palembanica were collected from these planting plots ( Figure 2). Leaf tissue samples were collected and brought back to the laboratory. Total genomic DNA was extracted using the cetyltrimethylammonium bromide (CTAB) method [27] with modification (2× CTAB) and purified using a High Pure Polymerase chain reaction (PCR) Template Preparation Kit (Roche Diagnostics, GmbH, Penzberg, Germany).

Chloroplast DNA Analyses
A total of four cpDNA markers, namely, rps4 and the intergenic spacers of atpB-rbcL, psbM-trnD, and trnD-trnE, were used to characterize the haplotype variants in I. palembanica [23]. PCR was performed in 10 μL reaction mixtures containing 1× Type-it Multiplex PCR Master Mix (Qiagen), 0.2 μM of each primer, and 10 ng of template DNA. The reactions were carried out using a SimpliAmp Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, USA), with an initial activation step at 95 °C for 5 min, 35 cycles of 95 °C for 30 s, 50 °C for 90 s, and 72 °C for 1 min, and a final extension at 60 °C for 30 min. ExoSAP-IT (Thermo Fisher Scientific) was used to clean up the PCR products, and sequencing was subsequently carried out in both directions using an ABI 3130xl capillary electrophoresis system (Applied Biosystems, Foster City, CA, USA). The sequences were analyzed and assembled using Sequencher 5.1 (Gene Codes Corporation, Ann Arbor, MI, USA), and haplotypes were determined on the basis of nucleotide substitutions and indels (insertion and deletions). These haplotypes were used to infer the source of origin, i.e., whether northern or southern Peninsular Malaysia, by using the reference population identification database, i.e., the cpDNA haplotype database of I. palembanica [23].

Chloroplast DNA Analyses
A total of four cpDNA markers, namely, rps4 and the intergenic spacers of atpB-rbcL, psbM-trnD, and trnD-trnE, were used to characterize the haplotype variants in I. palembanica [23]. PCR was performed in 10 µL reaction mixtures containing 1× Type-it Multiplex PCR Master Mix (Qiagen), 0.2 µM of each primer, and 10 ng of template DNA. The reactions were carried out using a SimpliAmp Thermal Cycler (Thermo Fisher Scientific, Waltham, MA, USA), with an initial activation step at 95 • C for 5 min, 35 cycles of 95 • C for 30 s, 50 • C for 90 s, and 72 • C for 1 min, and a final extension at 60 • C for 30 min. ExoSAP-IT (Thermo Fisher Scientific) was used to clean up the PCR products, and sequencing was subsequently carried out in both directions using an ABI 3130xl capillary electrophoresis system (Applied Biosystems, Foster City, CA, USA). The sequences were analyzed and assembled using Sequencher 5.1 (Gene Codes Corporation, Ann Arbor, MI, USA), and haplotypes were determined on the basis of nucleotide substitutions and indels (insertion and deletions). These haplotypes were used to infer the source of origin, i.e., whether northern or southern Peninsular Malaysia, by using the reference population identification database, i.e., the cpDNA haplotype database of I. palembanica [23].

Short Tandem Repeat (STR) Analyses
Samples were genotyped using 14 STR markers developed for I. palembanica: Ipa013, Ipa018, Ipa022, Ipa030, Ipa037, Ipa049, Ipa052, Ipa068, Ipa099, Ipa149 [28], IpaT01, IpaT31, IpaT32, and IpaT38 [23]. PCR amplifications were performed in 8 µL reaction volumes containing 10 ng of template DNA, 1× Type-it Multiplex PCR Master Mix (Qiagen, Venlo, The Netherlands), and 0.2 µM of each forward and reverse primer. The reaction mixture was subjected to amplification using a SimpliAmp Thermal Cycler (Thermo Fisher Scientific), with an initial activation step at 95 • C for 5 min, 35 cycles of 95 • C for 30 s, 50 • C for 90 s, and 72 • C for 30 s, and a final extension at 60 • C for 30 min. Amplified products were separated on an ABI 3130xl capillary electrophoresis system (Applied Biosystems) with GeneScan™ 400HD ROX used as the internal size standard. GeneMarker 2.6.4 (Softgenetics, LLC, State College, PA, USA) was used to analyze the profiles of amplified short tandem repeats. In order to assign the individuals to their putative population of origin, an assignment test was carried out on the basis of an individual identification database, i.e., the reference STR database of I. palembanica [23]. Individuals were assigned to populations by using the GeneClass2 software [29], with the Bayesian method [30] and a simulation of 1000 individuals [31]. The simulation algorithm generated population samples of the same size as the reference population sample. Furthermore, a test with exclusion probability was also carried out using the GeoAssign software [32], where the nearest-neighbor approach was utilized for individual assignment.

Results
The geographic origins of I. palembanica planted on the FRIM campus were inferred on the basis of the cpDNA haplotype database (Figure 3). The haplotypes generated from the 70 individuals included haplotype H01 (30%), H02 (14%), and H03 (39%), which were similar to the haplotypes reported previously [23]; moreover, new haplotypes ((17%) H07, H08, H09, and H10) were found in the current study ( Figure 4). These results suggest that 69% of the planted I. palembanica could have originated from either the northern or the southern region of Peninsular Malaysia, whereas 14% originated from the southern region of Peninsular Malaysia. The origin of the remaining trees (17%) could not be inferred because they exhibited new haplotypes not found in the database. with GeneScan ™ 400HD ROX used as the internal size standard. GeneMarker 2.6.4 (Softgenetics, LLC, State College, PA, USA) was used to analyze the profiles of amplified short tandem repeats. In order to assign the individuals to their putative population of origin, an assignment test was carried out on the basis of an individual identification database, i.e., the reference STR database of I. palembanica [23]. Individuals were assigned to populations by using the GeneClass2 software [29], with the Bayesian method [30] and a simulation of 1000 individuals [31]. The simulation algorithm generated population samples of the same size as the reference population sample. Furthermore, a test with exclusion probability was also carried out using the GeoAssign software [32], where the nearest-neighbor approach was utilized for individual assignment.

Results
The geographic origins of I. palembanica planted on the FRIM campus were inferred on the basis of the cpDNA haplotype database (Figure 3). The haplotypes generated from the 70 individuals included haplotype H01 (30%), H02 (14%), and H03 (39%), which were similar to the haplotypes reported previously [23]; moreover, new haplotypes ((17%) H07, H08, H09, and H10) were found in the current study ( Figure 4). These results suggest that 69% of the planted I. palembanica could have originated from either the northern or the southern region of Peninsular Malaysia, whereas 14% originated from the southern region of Peninsular Malaysia. The origin of the remaining trees (17%) could not be inferred because they exhibited new haplotypes not found in the database.    In order to increase the resolution of geographic origin to the population level, the STR database was used for an assignment test. Results suggest that the individuals originated from a mixture of different populations. The possible origins included Pasoh (with the highest assignment: 39%), HSelangor (26%), Korbu (10%), BLagong (7%), HGombak (7%), and Yong (7%), whereas the remaining 4% were assigned to Lenggeng, SNipah, and UMudaB ( Figure 5). The results were based on the highest probability in the assignment test (Table S1, Supplementary Materials), and exclusion probability was used to show the reliability of the given results (Table S2, Supplementary Materials). Overall, the majority of assigned populations (89%) originated from the southern region of Peninsular Malaysia, with only a small portion (11%) coming from the northern region ( Figure 3). On the basis of the inferred geographic origins for the different planting sites, we found that the seed source used in each site was a mixture of different populations (Table 1). For the two field sites with the highest number of individuals, i.e., Field 12 (17 individuals) and Field 24 (37 individuals), the main seed sources were inferred to have originated from Pasoh (59%) and HSelangor (35%), respectively. In order to increase the resolution of geographic origin to the population level, the STR database was used for an assignment test. Results suggest that the individuals originated from a mixture of different populations. The possible origins included Pasoh (with the highest assignment: 39%), HSelangor (26%), Korbu (10%), BLagong (7%), HGombak (7%), and Yong (7%), whereas the remaining 4% were assigned to Lenggeng, SNipah, and UMudaB ( Figure 5). The results were based on the highest probability in the assignment test (Table S1, Supplementary Materials), and exclusion probability was used to show the reliability of the given results (Table S2, Supplementary Materials). Overall, the majority of assigned populations (89%) originated from the southern region of Peninsular Malaysia, with only a small portion (11%) coming from the northern region ( Figure 3). On the basis of the inferred geographic origins for the different planting sites, we found that the seed source used in each site was a mixture of different populations (Table 1). For the two field sites with the highest number of individuals, i.e., Field 12 (17 individuals) and Field 24 (37 individuals), the main seed sources were inferred to have originated from Pasoh (59%) and HSelangor (35%), respectively.   Figure 5. The proportion of Intsia palembanica assigned to respective populations using an assignment test.

Discussion
In this study, we used both cpDNA haplotype (population identification) and STR (individual identification) databases to infer the geographic origins of the source of planting materials used in an I. palembanica early establishment. Firstly, the cpDNA haplotype database was used to infer whether the source originated from the northern or southern region of Peninsular Malaysia. We managed to infer that 14% (haplotype H02) of the total individuals originated from the southern region of Peninsular Malaysia. However, 69% harbored the common haplotypes H01 and H03, indicating that the source could have originated from either the northern or the southern region of Peninsular Malaysia. Four new haplotypes (H07, H08, H09, and H10), which were not captured in our reference database, were observed in the remaining 17% of individuals [23]. The limitation of cpDNA markers in terms of the power of discrimination to infer the geographic origins of I. palembanica could be due to the biological characteristics of the plant, such as its seed dispersal capacity. This species is commonly found in riparian habitats where seed dispersal can be assisted by river water. Subsequently, a common gene pool attributed to gene flow can be expected between the northern and southern regions of Peninsular Malaysia. Due to the slower evolutionary rate of change in the chloroplast genome compared with the nuclear genome, the cpDNA markers could have detected the ancestral introgression or retention of ancestral polymorphism by showing the common haplotypes H01 and H03 in both regions.
To overcome this limitation, we used the STR database to infer the geographic origins of the seed source at the population level. The assignment test results showed that 80% of seed sources originated from populations located near to the FRIM campus (BLagong, HGombak, HSelangor, Lenggeng, and Pasoh; Figure 3, Table 1). Referring to the planting records available at the institute, the planting of I. palembanica in different field plots was carried out in several different years from 1927 to 1950 [26] (Table 1). Considering the poor road accessibility during the 1920s, the seeds/wildings of I. palembanica were likely collected from nearby forests for ease of transportation back to the FRIM campus. The remaining 20% of I. palembanica individuals were found to have originated from relatively further locations (Yong, SNipah, Korbu, and UMudaB), mostly planted after the Japanese Occupation (1941)(1942)(1943)(1944)(1945) (Table 1) [26] to probably replace dead seedlings at the planting site.
This study shows that the I. palembanica cpDNA haplotype database has limited assignment power to infer geographic origin, whereby only 14% of the planted individuals were assigned to the southern region of Peninsular Malaysia. However, this limitation was overcome by using the I. palembanica STR database, whereby the geographic origins could be assigned accurately. By using the STR database, a total of 89% of the sample trees were deemed to have originated from the southern region of Peninsular Malaysia (BLagong, HGombak, HSelangor, Yong, SNipah, Pasoh, and Lenggeng), whereas the remaining 11% were deemed to have originated from the northern region (Korbu and UMudaB). Therefore, the DNA database developed for I. palembanica proves effective in assisting the tracking of this valuable timber, in addition to conventional methods such as paper-based documentation and paint-marking systems. The advantage of the DNA method compared to conventional approaches is that the former is resilient to falsification because it is based on inherent wood characteristics, such as the unique properties of DNA. A limitation may be encountered in the implementation of this method when the DNA extracted from the test sample is of low quality. Low-quality DNA could cause DNA polymerase amplification errors, resulting in PCR artefacts [20] or inhibition of the amplification process by residuals of polysaccharides and polyphenolic compounds [33]. These challenges are being addressed by researchers to improve the DNA extraction protocols from wood samples [34,35]. The current study utilized DNA extracted from leaf samples, and similar satisfactory amplification results were also shown when using DNA obtained from inner-bark wood samples [23]. For proof of concept, an experiment will be carried out to test amplification rates when DNA is isolated from heartwood samples in future studies.

Conclusions
The present study reported on the utilization of a genetic database developed for a highly valuable but vulnerable heavy hardwood, I. palembanica, for geographic origin traceability. By using the recently developed genetic database [23], the seed source for early establishment of I. palembanica set up approximately a century ago was inferred to have originated from several forest reserves located not far from the planting sites. In summary, this study proves the applicability of the DNA method in supply-chain verification, where an unknown I. palembanica tree can be traced to its geographic origin using a genetic database, in this case, within Peninsular Malaysia.
Supplementary Materials: The following are available online at http://www.mdpi.com/1999-4907/11/11/1171/s1. Table S1: Probability of assignment for each sample to the reference database. The highest probability is shown in bold. Table S2: Exclusion probability of assignment for each sample to the reference database.