You are currently viewing a new version of our website. To view the old version click .
Plants
  • Article
  • Open Access

19 November 2025

Exploring the Agromorphological Profiles of the Cacao (Theobroma cacao L.) Collection from the INIA Germplasm Bank in the Amazonas Region, Peru

,
,
,
and
1
Estación Experimental Agraria Amazonas, Dirección de Recursos Genéticos y Biotecnología, Instituto Nacional de Innovación Agraria (INIA), km 3.5 (Carretera Aeropuerto), Chachapoyas 01001, Peru
2
Centro Experimental La Molina, Dirección de Recursos Genéticos y Biotecnología, Instituto Nacional de Innovación Agraria (INIA), Av. La Molina 1981, Lima 15024, Peru
*
Authors to whom correspondence should be addressed.
Plants2025, 14(22), 3536;https://doi.org/10.3390/plants14223536 
(registering DOI)
This article belongs to the Special Issue Characterization and Conservation of Vegetable Genetic Resources

Abstract

Cacao is a strategic crop in Peru due to its significant socioeconomic impact, driving extensive efforts to collect, characterize, and conserve its genetic diversity. This study aimed to establish phenotypic criteria to differentiate and structure the Cacao Amazonas Perú (CAP) germplasm, thereby providing a foundation for selection and breeding programs. A total of 113 accessions from the INIA Germplasm Bank were evaluated over two consecutive growing seasons using 33 quantitative and 18 qualitative agromorphological descriptors. Data were analyzed through univariate and multivariate approaches. The results revealed substantial phenotypic variability, with coefficients of variation reaching up to 37.5% for fruit-related quantitative traits, all exhibiting high heritability values (>60%). Principal component analysis indicated that the first two components explained 29.3% of the total variance, primarily influenced by fruit and seed descriptors. Hierarchical clustering analysis identified eight phenotypic groups; one cluster exhibited high seed mass and a favorable pod index (17.63), while another showed the highest seed index (1.55 g) and the greatest intragroup distance (7.54). This comprehensive characterization highlights accessions with superior agronomic and bioactive potential, providing a robust framework for parental selection, core collection development, and targeted breeding strategies to enhance cacao competitiveness and resilience under changing climatic conditions.

1. Introduction

Climate change and demographic pressure pose critical challenges for agriculture in tropical regions, where limited availability of water and/or nutrients may constrain the advances in crop improvement []. In response, the development of cultivars resilient to fluctuating environmental conditions has emerged as a priority strategy to ensure sustainable productivity and reinforce food security. Within this framework, the systematic conservation and characterization of genetic diversity acquire scientific significance, as germplasm represents a strategic reservoir of adaptive alleles [].
The chocolate tree (Theobroma cacao L.) is a tropical cauliflorous species of the Malvaceae family [], domesticated approximately 3500 years BC in northwestern South America [,]. This chronology is supported by fossil evidence recovered from archaeological sites of pre-Columbian cultures such as Chinchipe (Ecuador) and Montegrande (Peru) []. Cocoa, a strategic raw material for the chocolate industry, represents a major global economic asset owing to its high added value derived from a biochemical composition rich in secondary metabolites and alkaloids with immunomodulatory properties [,]. Its economic relevance is further reflected in projections for the global chocolate market, estimated at USD 142.88 billion in 2024, with a sustained annual growth rate of 3.5%, expected to reach approximately USD 194.73 billion by 2033 [].
Globally, three main cacao groups have traditionally been recognized: Criollo, Forastero, and their hybrid Trinitario []. However, advances in phylogenomic analysis have enabled the reclassification of cacao diversity into eleven distinct genetic groups: (I) Amelonado, (II) Contamana, (III) Criollo, (IV) Curaray, (V) Guayana, (VI) Iquitos, (VII) Marañón, (VIII) Nacional, (IX) Nanay, (X) Purús, and (XI) Nacional Boliviano [,]. These groups are distributed throughout the neotropical regions of South and Central America, representing a gradient of domestication and bioclimatic adaptation. Within this geographic context, Peru emerges as a hotspot of genetic diversity, harboring seven of the eleven identified groups (Contamana, Curaray, Iquitos, Marañón, Nanay, Purús, and Nacional), thereby positioning the country as a reservoir of wild alleles and agronomically valuable variants [,].
In Peru, cacao cultivation extends across the mid and high-altitude jungle ecoregions, ranging from Cusco to Amazonas []. Among these, the Amazonas region is distinguished by the production of a wide diversity of cacao beans, which are renowned for their appealing sensory attributes and have earned the region the protected designation of origin Cacao Amazonas Perú (CAP) []. This distinction has facilitated the capitalization of growing market demand, substantially enhancing the international competitiveness of the product, considering that fine-flavor cocoa represents approximately 75% of Peru’s total cocoa exports [].
In the Amazonas region, the Nacional and Criollo varieties are the principal representatives of fine-flavor cacao []. Criollo is cultivated across 22.5% of the production area in the province of Bagua, 72.7% in Utcubamba, and 78.3% in Condorcanqui []. Owing to its wide distribution, cacao cultivation makes a significant contribution to regional development, benefiting broad sectors of the population, particularly those in vulnerable socioeconomic conditions.
To maximize crop profitability, farmers prioritize the selection of genotypes that exhibit desirable traits such as resilience to biotic and abiotic stresses, high fruit productivity with morphologically appealing pods, and large seeds possessing distinctive organoleptic profiles []. However, both intra- and inter-population genetic variability are known to fluctuate spatiotemporally under the influence of reproductive, geographical, and anthropogenic factors, thereby contributing to phenotypic heterogeneity []. Under these circumstances, the systematic characterization of a wide range of accessions becomes essential for elucidating advantageous gene combinations and identifying promising plant genetic resources [].
Phenotypic agrobiodiversity is conventionally assessed through the evaluation of qualitative and quantitative traits []. Among these, average seed weight stands out as a key quantitative descriptor due to its high heritability. Seed size and morphometric uniformity are regarded as priority attributes in the chocolate industry, thereby justifying their inclusion as a fundamental criteria in the design of assisted genetic selection programs [,]. This perspective underscores the methodological relevance of morphological characterization as an analytical foundation for population genetics studies, while remaining accessible and farmer-friendly [,]. Consequently, deepening the understanding of which phenotypic traits discriminate the Amazonian germplasm will enable the identification and selection of elite accessions for crop improvement.
Numerous morphological descriptors have been documented in the literature for the characterization of commercially important crops, including Coffea arabica L. [], Amaranthus [], Hibiscus sabdariffa L. [], Phoenix dactylifera L. [], and Capsicum spp. [], among others. In the case of cacao, 65 agromorphological descriptors were established in the late twentieth century and have since been systematically adopted by international organizations such as the Tropical Agricultural Research and Higher Education Center (CATIE), the International Cocoa Genebank Trinidad (ICGT), and the International Cocoa Germplasm Database (ICGD), which have supported comparative studies and genomic conservation efforts for decades []. Building on this framework, a recent study conducted in northeastern Peru [], identified five groups within a population of 146 fine-flavor cacao ecotypes, classified as Toribianos, Utkus, Cajas, Indes, and Bagüinos. Among these, the latter two groups are notable for their adaptation to elevations above 500 m, exhibiting superior organoleptic profiles and high productivity, as evidenced by an improved pod index (<13.88).
In light of Peru’s recognized richness in plant genetic resources, this study explicitly addresses the existing knowledge gap regarding the definition of phenotypic criteria for differentiating and structuring of the CAP germplasm. To this end, we systematically evaluated 113 accessions from the INIA Germplasm Bank using an integrated set of 51 standardized descriptors and implemented a multivariate analytical framework aimed at identifying discriminant traits and delimiting promising accessions with notable phytochemical potential. This approach seeks to generate robust and operational evidence to strengthen the position of Peruvian cacao and guide its selection, conservation, and genetic improvement under the current context of climate change.

2. Materials and Methods

2.1. Germplasm Collection

During 2016, plant material (scions) was collected from the middle third of the canopy of previously identified mother trees located in natural populations within the provinces of Bagua and Utcubamba (Figure 1). The selection of these specimens was carried out strategically, based on distinctive morphological characteristics recognized by local producers. This process allowed for the recording of passport data and the subsequent assignment of a unique “PER” code (Table S1), which identifies each accession conserved in the Germplasm Bank of the National Institute of Agrarian Innovation (INIA-Peru) [].
Figure 1. Cartographic collection of the 113 cacao accessions collected from the Amazonas region, Peru. The collection sites are shown for two provinces (Bagua and Utcubamba) using shared symbology but different map scales. The projection is based on the World Geodetic System 1984 (WGS 84), Universal Transverse Mercator (UTM) Zone 17S, Datum WGS84.
The study provinces, located in Amazonas region, are characterized by pronounced geographic, climatic, and altitudinal variability, as previously documented by Vásquez-García []. This environmental heterogeneity creates an ecological mosaic that fosters a broad expression of phenotypic variability among the evaluated cacao accessions.
The germplasm collection, consisting of buds from 113 accessions, was grafted onto the universal rootstock IMC 67, known for its vegetative vigor and disease resistance. The rootstocks were previously cultivated under shade house conditions to ensure uniform and pathogen-free growth. The establishment was carried out through top cleft grafting during the same year of collection, using scions with a diameter of 10 mm on rootstocks transplanted to the final field (Figure 2). The grafts were installed within an agroforestry system following a Latin square experimental design, with a spacing of 3 × 3 m2. Each accession comprised nine grafts, individually identified with QR codes to ensure traceability. During the first two years, the system was intercropped with Musa paradisiaca L., while the forest species Cedrela odorata L., Inga edulis Mart., and Calycophyllum spruceanum Benth., planted at 10 m intervals, reached their developmental stage.
Figure 2. Schematic flow of the methodological framework applied in the study.

2.1.1. Location

The ex situ conservation of the cacao collection maintained by the Germplasm Bank of INIA–Peru was established at the Huarangopampa Experimental Center (EC), a subsidiary of the Amazonas Agrarian Experimental Station. This site is located in the district of El Milagro, Utcubamba province (5°39′50″ S, 78°32′01″ W), at an altitude of 480 m a.s.l., where the collection comprises a total of 1017 plants distributed across 1 ha (Figure S1).

2.1.2. Agronomic Management

During the establishment and maintenance phases, standardized cultural practices were implemented, including weed control, periodic irrigation according to plant water requirements, and split fertilization applied twice per season. The latter was determined based on the crop’s nutritional demands, as established through systematic soil analyses conducted throughout the evaluation period.
In addition, formative and maintenance pruning were consistently performed over six consecutive years, accompanied by continuous phytosanitary monitoring to ensure the timely detection of pests and diseases (Figure 2). During this period, graft viability was confirmed, and most accessions reached a stable productive stage. From this point onward, the evaluation of agromorphological traits was scheduled when the plants reached seven years of age (2023 season) and eight years (2024 season), encompassing two consecutive evaluation cycles.

2.1.3. Agroecological Conditions

The initial edaphic conditions of the site were characterized by a clay-texture soil, with a pH of 7.8 and an electrical conductivity of 9.9 mS/m. Regarding mineral availability, the soil analysis indicated concentrations of 3.9 mg/kg of phosphorus, 176.6 mg/kg of potassium, 7.2% total carbon, 0.7% organic matter, and 0.4 mg/g total nitrogen. Furthermore, the concentration of exchangeable bases was 26.5 cmol(+)/kg for calcium, 3.0 cmol(+)/kg for magnesium, and 0.6 cmol(+)/kg for sodium. These properties were determined through analytical characterization conducted by the Soil, Water, and Foliar Analysis Laboratory (LABSAF) of INIA, which is accredited according to the NTP-ISO/IEC 17025:2017 standard [] under registration No. LE-200.
During the characterization period, the climatic conditions in the collection area were characterized by a higher accumulated precipitation in 2023 (1040.9 mm) compared with 2024 (926.5 mm) []. Temperature remained relatively stable across both years (Figure S2), whereas relative humidity exhibited a marked decline between July and September in both periods, coinciding with the months of lowest rainfall.

2.2. Agromorphological Characterization

A total of 51 phenotypic descriptors were employed for the agromorphological characterization of the germplasm [,,], comprising 33 quantitative and 18 qualitative traits. The selection of these descriptors was based on their capacity to reveal the distinctive characteristics of each accession, encompassing key morphological structures such as leaves, flowers, fruits, and seeds (Figure 2).

2.2.1. Quantitative Descriptors

Leaf: LL = Leaf length (cm), LW = Leaf width (cm), PtL = Petiole length (cm).
Flower: PdL = Pedicel length (cm), SpL = Sepal length (mm), SW = Sepal width (mm), PL = Petal length (mm), PW = Petal width (mm), LgW = Ligule width (mm), FL = Filament length (mm), StL = Staminode length (mm), SL = Style length (mm), OL = Ovary length (mm), OW = Ovary width (mm).
Fruit: SH = Shell hardness (MPa), FM = Fruit mass (g), FrL = Fruit length (cm), FW = Fruit width (cm), PrT = Pericarp thickness (cm), GD = Groove depth (cm), PM = Pericarp mass (g), NL = Number of locules (unit), TSS = Total soluble solids (°Brix).
Seed: NSL = Number of seeds per locule (unit), FSMF = Fresh seed mass per fruit (g), NSF = Number of seeds per fruit (unit), NIS = Number of intact seeds (unit), NES = Number of empty seeds (unit), SeL = Seed length (mm), SD = Seed diameter (mm), ST = Seed thickness (mm), SI = Seed index (g) and PI = Pod index (unit). The mathematical equations used to calculate SI and PI are provided below []:
S I = D r y   m a s s   o f   n   s e e d s   ( g ) n
P I = 1000   ( g ) A v e r a g e   s e e d   n u m b e r × [ A v e r a g e   m a s s   o f   d r y   s e e d   ( g ) ]  

2.2.2. Qualitative Descriptors

The qualitative morphological descriptors are detailed in Table 1.

2.3. Phytochemical Characterization

Fresh cotyledons from the promising accessions, together with a representative accession to ensure the inclusion of all phenotypic groups in the analysis, were preserved at −80 °C for 24 h and subsequently lyophilized under a pressure of 0.003 mbar for 72 h (Labconco, Corp., Kansas City, MO, USA, −84 °C). The resulting samples were ground, sieved (850 μm), and defatted following the protocol adapted from Cortez []. The extract was filtered with Whatman paper No. 40 and stored in amber Eppendorf-type tubes at −22 °C. All analyses were performed in triplicate.

2.3.1. Colorimetric Measurement, Titratable Acidity, and pH

The color of the lyophilized powder was determined in the CIELAB color space (L*, a*, b*), using a CR-400/DP-400 digital colorimeter (Konica Minolta Inc., Tokyo, Japan). Simultaneously, 10 g of sample were dissolved in 50 mL of Milli-Q water at 100 °C. The mixture was briefly vortexed and filtered using Whatman No. 40 paper. The pH of the solution was measured using an HI2211 pH meter (Hanna Instruments, Woonsocket, RI, USA). Titratable acidity (TA) was determined by titration to pH 8.1 using 0.1 N NaOH, incorporating three drops of phenolphthalein as an indicator. The results were expressed as a percentage of acetic acid equivalent.

2.3.2. Bioactive Compound Profile

The antioxidant capacity was evaluated using the DPPH free radical scavenging assay, following the procedure described by Brand-Williams [], with minor modifications. For this purpose, 100 μL of the extract was mixed with 3.9 mL of DPPH solution in amber Eppendorf tubes, and the absorbance was measured at 515 nm using a UV/Vis spectrophotometer (Genesys 180, Thermo Scientific™, Madison, WI, USA). Quantification was carried out using Trolox standards (y = 0.2321x − 1.2859; R2 = 0.9978), and the results were expressed as milligrams of Trolox equivalents (TE) per gram of sample (mg TE/g).
Total phenolic content (TPC) was determined using the Folin–Ciocalteu reagent []. For this, 500 μL of the extract was combined with 2.5 mL of 10% reagent and 2 mL of Na2CO3 (4% w/v) in amber tubes. The absorbance was then measured at 750 nm using a UV/Vis spectrophotometer. Concentrations were calculated from a gallic acid calibration curve (y = 0.0036x + 0.1872; R2 = 0.9982), and the results were expressed as milligrams of gallic acid equivalents (GAE) per gram of sample (mg GAE/g).

2.3.3. HPLC Screening of Methylxanthines and Phenolic Compounds

Individual standard solutions of theobromine, caffeine, caffeic acid, catechin, epicatechin, and cyanidin 3-O-glucoside were prepared, with concentration ranges and retention times established according to the optimized protocol described by Cortez []. The analysis was performed using an ultra-high-performance liquid chromatograph (Agilent 1290 Infinity II, Agilent Technologies, Santa Clara, CA, USA) equipped with a Zorbax Eclipse Plus C18 column (4.6 × 25 cm2, 5 μm) coupled to an automated diode-array detector (DAD). The operational conditions included an injection volume of 20 μL, a total run time of 22 min, and a constant temperature of 26 °C. Detection was conducted at 280 nm, and the results were expressed as milligrams of standard equivalent per gram of dehydrated sample (mg/g).

2.3.4. FTIR Spectroscopy Screening

The characterization of functional groups in cacao samples was conducted using Fourier transform infrared spectroscopy in attenuated total reflectance mode (FTIR-ATR). A total of 50 mg of freeze-dried cotyledons were analyzed using a Nicolet iS50 spectrophotometer (Thermo Scientific, Madison, WI, USA), with background correction performed through Omnic v9.2 software. Subsequently, the spectra were recorded at room temperature, averaging 50 scans per sample, over a spectral range of 4000–500 cm−1, with a resolution of 4 cm−1.

2.4. Statistical Processing

Statistical analyses and graphical representation of data pooled from two consecutive years of agromorphological characterization were performed using the R environment v4.4.3 [], applying a combined univariate and multivariate analytical approach (Figure 2).

2.4.1. Analysis of Frequencies and Genetic Components

Quantitative data were initially cleaned by detecting and excluding outliers using the outliers package v0.15 []. Subsequently, the variability of quantitative traits was described with the summarytools package v1.0.1 [], calculating the mean, standard deviation (DevSt), extreme values (max and min), and coefficient of variation (CV). Frequency distributions were graphically represented with the ggplot2 package v3.5.2 [], employing histograms for quantitative descriptors and bar plots for qualitative ones.
Using the variability package v0.1.0 [], which integrates the respective equations [], the following genetic parameters were estimated: genotypic variance (GV) and phenotypic variance (PV), genotypic and phenotypic coefficients of variation (GCV and PCV), broad-sense heritability (H2), selection response expressed as genetic advance (GA), and genetic advance as a percentage of the mean (GAM).

2.4.2. Correlation and Principal Component Analysis

The magnitude and direction of phenotypic associations among pairs of quantitative descriptors were assessed using Pearson’s correlation coefficient, calculated with the Hmisc package v5.2-3 []. To identify the quantitative descriptors contributing most to the total variability of the germplasm, a principal component analysis (PCA) was performed on the standardized data matrix (mean = 0; standard deviation = 1) using the FactoMineR v2.12 [] and Factoextra v1.0.7 [] packages. Eigenvalues and eigenvectors were computed for each principal component, and only those components with eigenvalues greater than 1 and a cumulative variance explaining at least 75% of the total variability were retained [,]. Finally, the first two principal components were represented in a biplot generated with the ggplot2 package.

2.4.3. Cluster Analysis and Genetic Distance

Hierarchical Cluster Analysis (HCA) was conducted using Euclidean distance and Ward’s minimum variance method (Ward.D2), implemented in the cluster package v2.1.8.1 []. The optimal number of clusters (K) was initially determined through the average silhouette index calculated with the factoextra package, and subsequently verified using the Gap Statistic method from the cluster package. Cluster stability was assessed by multiscale bootstrap resampling with the pvclust package v2.2-0 []. The graphical representation of the clustering was generated as a heatmap with associated dendrograms using the ComplexHeatmap v2.24.1 [] and dendextend v1.19.1 [] packages.
Genetic distance analysis was performed based on the Euclidean distance matrix and the cluster vector derived from the HCA. For this purpose, the customized function “cluster_distance” was used to compute the average inter- and intra-group distances.
Following the identification of phenotypic groups through HCA, assumptions of normality (Kolmogorov–Smirnov test) and homogeneity of variances (Bartlett’s test) were verified, considering a significance level of p ≥ 0.05. Differences among groups were evaluated using an unbalanced one-way analysis of variance (ANOVA), in which each accession was treated as a replicate within its respective phenotypic group, according to a fixed-effects linear model [].
Y i j = μ + α i + ε i j
where Y i j represents the j th observation within the i th phenotypic group, μ denotes the overall mean of the descriptor, α i corresponds to the fixed effect of the i th phenotypic group, and ε i j represents the independent random error.
Multiple comparisons were performed using Tukey’s HSD test at a 95% confidence level, implemented in the agricolae package v1.3-7 [].

2.4.4. Analysis of Promising Accessions

Promising accessions in terms of productivity were identified through using a bivariate linear mixed model under Bayesian inference, implemented in the MCMCglmm package [] with Monte Carlo simulations through Markov chains and non-informative prior distributions []. Convergence was assessed with the coda v0.19.4 package [], and the Bayesian Yield Stability Index (BYSI) was calculated as the minimum yield with a 90% probability of being exceeded [], with uncertainty expressed through highest posterior density (HPD) intervals. The results were summarized in a biplot generated using ggplot2.
Differences among phytochemical parameters were determined using a one-way analysis of variance (ANOVA, p < 0.05), after verifying data normality with the Shapiro–Wilk test (p ≥ 0.05). Multiple comparisons were conducted using Tukey’s HSD test at a 95% confidence level with the agricolae package, and results were visualized using ggplot2 and tidyverse v2.0.0 []. Additionally, CIELAB coordinates were graphically represented using ggplot2 and ggforce v0.5.0 [], while FTIR spectra were normalized and plotted using the ChemoSpec v6.3.0 [] and ggplot2 packages.

3. Results

3.1. Variation Patterns in Quantitative and Qualitative Descriptors

The quantitative assessment of 113 cacao accessions revealed substantial phenotypic diversity, supported by the values of CV, means, and ranges (Figure 3). The analysis of the pooled data from two evaluation campaigns showed moderate variability among foliar descriptors. LL ranged from 24.38 to 30.03 cm (CV = 4.4%), with a higher frequency of accessions occurring between 28.00 and 28.50 cm. LW varied between 8.50 and 11.32 cm (CV = 6.52%), with the highest frequency of accessions clustered in the 9.00–9.75 cm range.
Figure 3. Distribution histograms of the germplasm for the quantitative descriptors. LL = Leaf length (cm); LW = Leaf width (cm); PtL = Petiole length (cm); PdL = Pedicel length (cm); SpL = Sepal length (mm); SW = Sepal width (mm); PL = Petal length (mm); PW = Petal width (mm); LgW = Ligule width (mm); FL = Filament length (mm); StL = Staminode length (mm); SL = Style length (mm); OL = Ovary length (mm); OW = Ovary width (mm); FM = Fruit mass (g); FrL = Fruit length (cm); FW = Fruit width (cm); PrT = Pericarp thickness (cm); GD = Groove depth (cm); PM = Pericarp mass (g); TSS = Total soluble solids (°Brix); NSL = Number of seeds per locule (Unit); FSMF = Fresh seed mass per fruit (g); NSF = Number of seeds per fruit (Unit); SI = Seed index (g); PI = Pod index (Unit); NIS = Number of intact seeds (Unit); SeL = Seed length (mm); SD = Seed diameter (mm); ST = Seed thickness (mm).
Regarding floral traits, PdL ranged from 1.15 to 2.47 cm (CV = 16.42%), with the majority of accessions concentrated between 1.30 and 1.80 cm. SpL exhibited the highest frequency of accessions between 8.00 and 8.50 mm (CV = 13.57%). StL ranged from 5.25 to 8.92 mm (CV = 10.46%). Finally, OL within the evaluated germplasm fluctuated between 1.03 and 2.33 mm (CV = 15.51%), with the highest frequency occurring within the 1.60 to 1.70 mm interval.
Concerning fruit characteristics, FM exhibited a broad range of variation from 302.17 to 1599.33 g (CV = 32.53%), showing a higher frequency of accessions within the 500 to 900 g interval (Figure 3). FrL showed the highest frequency of accessions between 16 and 17 cm, whereas in FW ranged from 8.80 to 10.00 cm (CV = 9.61%). Fruits were primarily characterized by a PrT ranging from 1.28 to 3.22 cm (CV = 19.83%), with the highest proportion of accessions between 1.64 and 2.40 cm, and an average SH of 0.36 MPa (Figure S3).
The sweetness of the mucilage covering the cotyledons (TSS) up to 25.25 °Brix (CV = 17.29%) with the highest frequency of accessions concentrated between 15 and 17 °Brix. Likewise, most accessions produced between 32.5 and 50 seeds per fruit (NSF; CV = 16.42%) and an average of only 4.03 empty seeds (Figure S3). Regarding individual seed mass (SI), the highest frequency of accessions fell within the 1.10 to 1.20 g range, reaching up to 2 g (CV = 21.25%). These attributes are associated with larger seed dimensions, including SeL and ST, which ranged from 10.53 to 17.12 mm and from 6.90 to 12.83 mm, respectively. Additionally, PI values ranged between 11.05 and 44.90, with a mean of 20.57.
Qualitative characters (Table 1) were classified into distinct categories according to the phenotypic variability detected, adjusting the number of classes to the specificity of each morphotype. This approach enabled a systematic characterization of the diversity, revealing both frequent traits and others with uncommon occurrence within the evaluated set (Figure 4 and Figure 5). Among the 113 accessions analyzed, the apiculate LAS trait was predominant, being observed in 112 accessions. Regarding the LBS, an even distribution was observed between acute and codiform shapes (n = 44 each), while the obtuse shape was less frequent (n = 25). Concerning the YLC, brown (n = 50) predominated, followed by red (n = 47), both being more represented than green (n = 16).
Table 1. Morphological descriptors evaluated in the characterization of cacao germplasm.
Figure 4. Frequency distribution of the germplasm for qualitative descriptors. LBS = Leaf base shape; LAS = Leaf apex shape; YLC = Young leaf color; PC = Pedicel color; ASe = Anthocyanin in sepals; ASt = Anthocyanin in staminodes; AF = Anthocyanin in filament; AO = Anthocyanin in ovary; AL = Anthocyanin in ligule; IFC = Immature fruit color; MFC = Mature fruit color; FS = Fruit shape; FAS = Fruit apex shape (1Attenuate, 3Obtuse, 4Rounded, 5Apezonate and 6Dentate); FBC = Fruit basal constriction; FR = Fruit roughness; STS = Seed transversal shape; SLS = Seed longitudinal shape; SC = Seed color.
Figure 5. Phenotypic diversity among cacao accessions from the INIA Germplasm Bank, collected in the Amazonas region of Peru.
In floral structures, most accessions exhibited a PC with a reddish green hue (n = 81), while a smaller number displayed a green hue (n = 32). Regarding anthocyanin presence, variation was observed across distinct floral organs within the collection: ASe (n = 44), AF (n = 2), AO (n = 27), and AL (n = 5). Conversely, ASt was consistently observed in all accessions (n = 113), suggesting potential genetic fixation of this trait.
For fruit descriptors, IFC was predominantly green (n = 80), followed by pigmented green (n = 30). In contrast, MFC was mainly yellow (n = 78), with lower frequencies of orange (n = 25) and red (n = 10) coloration. With respect to FS, elliptical (n = 65) and oblong (n = 39) forms were dominant, whereas rounded (n = 7) and ovate (n = 2) morphologies were uncommon. FR was mostly classified as mild (n = 42), intermediate (n = 39), or rough (n = 29).
Regarding seed traits, the STS was almost evenly distributed between flattened (n = 55) and intermediate (n = 58) categories, while the SLS was predominantly elliptical (n = 57). For SC, purple (n = 69) and violet (n = 42) were the most frequent, whereas white and pink were rare (n = 1 each), suggesting either recessive inheritance or low phenotypic expression of these color variants within the evaluated germplasm.

3.2. Estimation of Quantitative Genetic Parameters

Phenotypic characterization constitutes the foundation for assessing the diversity within a germplasm collection; however, its effectiveness is intrinsically conditioned by the degree of association between the observable phenotypic expressions and their underlying genetic basis (Table 2).
Table 2. Estimation of genetic variation indicators in quantitative descriptors.
In the leaf descriptors, all traits exhibited low GCV and GAM values (<10%), indicating limited genetic variability and low potential for selection response. Furthermore, only LW and LL showed moderate broad-sense heritability estimates (H2 > 30%), suggesting the influence of transmissible genetic effects. In contrast, within the floral descriptors, several traits such as PdL, SW, and OW exhibited moderate GCV values (>10%), whereas FL stood out with the highest value (32.36%). For PCV, the traits SW, FL and SL showed elevated values (>20%). Overall, H2 was generally moderate to high (>30%) for most floral descriptors, except for SL, which recorded a markedly low value (1.55%). Moreover, the GAM exceeded 10% for ten of the eleven floral traits assessed, indicating a moderate to high genetic potential and suggesting an adequate capacity for response to selection.
For the fruit descriptors, FM and PM presented high values of GCV, PCV, and GAM (>20%), suggesting strong potential for genetic improvement through selection. Meanwhile, SH, NL, and TSS exhibited maximum H2 values (100%), indicating that the observed variation is completely attributable to stable genetic effects. Regarding seed characteristics, FSMF and NES showed high GCV (21.31% and 28.48%, respectively). Notably, FSMF exhibited both moderately high H2 (54.48%) and high GAM (32.41%), reflecting strong potential for selection response. Finally, PI and NES reached PCV values above 30%, highlighting a greater influence of environmental factors on their phenotypic expression.

3.3. Correlation and Principal Component Analysis

The Pearson correlation coefficient revealed distinct levels of association among the evaluated morphological structures (Table S2). Strong correlations were identified primarily among fruit and seed traits, with notable associations between FM and PM (0.95***), FW and PM (0.80***), NSL with NSF (0.87***), and NIS (0.89***), as well as between NSF and NIS (0.93***). Regarding moderate correlations, these were heterogeneously distributed across the analyzed morphometric structures, with a higher incidence in floral descriptors, where the correlation coefficient (r) ranged from 0.30 to 0.46. Overall, most traits exhibited weak correlations, suggesting that the quantitative descriptors tend to vary independently, reflecting the influence of distinct genetic control mechanisms and environmental modulation.
The PCA enabled the identification of variation patterns and relationships among the 33 quantitative descriptors evaluated in the cacao accessions (Figure 6). The first twelve dimensions (eigenvalues > 1) together accounted for 77.67% of the total variance (Table S3). The first dimension, which explained 18.8% of the variation, was dominated by fruit descriptors, among which FM (11.31%), PM (10.62%), and FrL (10.18%) showed the highest positive contributions (Figure 6a,b, Table S3), reflecting their close association with fruit size and weight characteristics. The second dimension explained 10.5% of the variability and was mainly influenced by seed descriptors such as NSF (−0.66), NIS (−0.59), and NSL (−0.55), which contributed 12.76%, 10.05%, and 8.92%, respectively (Figure 6a–c, Table S3), to the formation of this dimension, associated with higher yield potential. Finally, the homogeneous distribution of accessions around the biplot center (Figure 6a) indicates a consistent level of heterogeneity within the studied germplasm.
Figure 6. Principal component analysis. (a) Biplot of Dim-1 and Dim-2 illustrating the distribution of the 113 accessions and the projection of the 33 quantitative descriptors within a 95% confidence ellipse; (b,c) contribution of the descriptors to the formation of the first two principal dimensions. LL = Leaf length (cm); LW = Leaf width (cm); PtL = Petiole length (cm); PdL = Pedicel length (cm); SpL = Sepal length (mm); SW = Sepal width (mm); PL = Petal length (mm); PW = Petal width (mm); LgW = Ligule width (mm); FL = Filament length (mm); StL = Staminode length (mm); SL = Style length (mm); OL = Ovary length (mm); OW = Ovary width (mm); SH = Shell hardness (MPa); FM = Fruit mass (g); FrL = Fruit length (cm); FW = Fruit width (cm); PrT = Pericarp thickness (cm); GD = Groove depth (cm); PM = Pericarp mass (g); NL = Number of locules (Unit); TSS = Total soluble solids (°Brix); NSL = Number of seeds per locule (Unit); FSMF = Fresh seed mass per fruit (g); NSF = Number of seeds per fruit (Unit); SI = Seed index (g); PI = Pod index (Unit); NIS = Number of intact seeds (Unit); NES = Number of empty seeds (Unit); SeL = Seed length (mm); SD = Seed diameter (mm); ST = Seed thickness (mm).

3.4. Structural Organization of Germplasm

The HCA based on quantitative descriptors identified eight phenotypic clusters (K = 8), with an average silhouette index of 0.20 (Figure S4), indicating an overall acceptable partitioning of variability within the germplasm collection. The eight clusters are displayed with distinct colors along the dendrogram rows (Figure 7). Moreover, the stability of the hierarchical structure, assessed through the multiscale bootstrap resampling, revealed that Clusters II and III exhibited high robustness, with approximately unbiased (AU) support values ≥ 95%, confirming their phenotypic consistency within the analyzed dataset.
Figure 7. Dendrogram and heatmap generated through multivariate hierarchical clustering applied to 113 cacao accessions. Columns represent the different quantitative traits, where higher values are indicated by greater blue intensity and lower values by deeper red intensity. Quantitative descriptors are defined in Section Quantitative Descriptors of the paper.
A heterogeneous distribution of accessions was observed among clusters (Table S4), with Cluster VII comprising the largest proportion of accessions (24.78%), followed by Cluster VIII (17.70%) and III (15.04%), demonstrating a higher phenotypic representativeness within these groups. In contrast, Cluster IV exhibited the lowest representativeness (7.08%), whereas Clusters II and V displayed similar proportions (8.85%). Notably, according to the collection sites (Figure 1), accessions within each cluster originated from a broad range of altitudinal gradients, suggesting that the observed clustering pattern is primarily driven by intrinsic genetic variability, attributable to the genotype of each accession.
The data presented in Table 3 indicate that Cluster VIII exhibits the highest internal heterogeneity (intra-cluster distance = 7.54) and an average distance of 8.95, revealing substantial genetic variability among its accessions and positioning it as a promising source for broadening the parental diversity in breeding programs. Likewise, the greatest genetic divergences among clusters were observed between Cluster I–VII (10.03) and between Cluster I–VIII (9.85), suggesting that crosses among these groups could maximize heterosis. Conversely, the accessions belonging to Cluster II–III (7.41) and V–VII (7.46) exhibited the lowest genetic distances, indicating greater genetic similarity and, consequently, a reduced potential for complementary genetic combinations in their progenies.
Table 3. Genetic distances among clusters.

3.5. Structural Analysis for Quantitative Descriptors

The clustering pattern revealed a heterogeneous distribution of the agromorphological attributes of interest among the different clusters, thus reflecting the structural diversity that exists among the accessions with agronomic potential (Table 4).
Table 4. Overall means and standard deviation (S.D.) of 33 quantitative agromorphological descriptors by phenotypic clusters.
Regarding the floral descriptors, Cluster I exhibited the highest mean values in seven out of the eleven traits analyzed, notably SpL (8.88 mm), StL (7.58 mm), and PL (5.00 mm), indicating a more pronounced floral development. In contrast, Cluster V was distinguished by accessions with greater SL (3.71 mm) and OL (2.03 mm).
For the fruit descriptors, seven out of the nine traits with the highest mean values were recorded in Cluster II, highlighting FM (1267.08 g), PM (972.28 g), and fruit dimensions, with 23.52 cm for FrL and 11.28 cm for FW, suggesting enhanced structural and productive development within this group
Concerning seed descriptors, Cluster VIII exhibited the highest average SI (1.53 g), along with SD of 15.16 mm and ST of 10.20 mm. Conversely, accessions in Cluster III were characterized by their high prolificity, as evidenced by the highest mean values for NSL (9.09), NSF (46.02), and NIS (41.95), which resulted in a higher FSMF (180.4 g). Finally, Cluster II stood out for presenting the lowest mean PI value (17.63), indicating a better proportion of seed mass per fruit.

3.6. Structural Analysis for Qualitative Descriptors

Frequency analysis revealed marked variation in morphological patterns among the accessions of the eight phenotypic clusters (Figure 8; Table S5). For the foliar descriptors, substantial variability was observed in LBS, with accessions from Cluster V predominantly exhibiting an acute base (60%), whereas Cluster IV showed a higher proportion with an obtuse base (50%). Conversely, Cluster VII was distinguished by a high frequency of accessions with a codiform shape (57.14%) and was the only group that exhibited acuminate apices (LAS; 3.57%).
Figure 8. Frequency distribution of qualitative descriptors among clusters. LBS = Leaf base shape; LAS = Leaf apex shape; YLC = Young leaf color; PC = Pedicel color; ASe = Anthocyanin in sepals; ASt = Anthocyanin in staminodes; AF = Anthocyanin in filament; AO = Anthocyanin in ovary; AL = Anthocyanin in ligule; IFC = Immature fruit color; MFC = Mature fruit color; FS = Fruit shape; FAS = Fruit apex shape; FBC = Fruit basal constriction; FR = Fruit roughness; STS = Seed transversal shape; SLS = Seed longitudinal shape; SC = Seed color.
With respect to floral descriptors, PC showed that only accessions in Cluster II displayed a high frequency of green color (60%), whereas in the remaining groups, reddish green predominated, exceeding 60%. The presence of anthocyanins in sepals was low across all groups recorded in less than 50% of accessions, while in staminodes (ASt), anthocyanins were consistently present in 100% of grouped accessions. Conversely, for the ovary, accessions in Cluster V were characterized by a complete absence of anthocyanins (100%).
Regarding fruit coloration, Cluster II and III predominantly exhibited a green hue for the IFC descriptor, with 90% and 82.35% of accessions, respectively. In contrast, only Clusters III, VI, VII, and VIII developed red coloration in variable proportions for the MFC descriptor, whereas yellow and orange hues were present across all clusters. Concerning FS, Clusters II and IV exclusively displayed oblong and elliptical fruits, whereas Cluster III included 11.76% of accessions with ovate fruits. With respect to the FR descriptor, this trait was most prominent in Cluster II, where 40% of the accessions exhibited a rough pericarp. In contrast, Cluster I, III, and VIII lacked this characteristic, with absence rates of 11.11%, 5.8%, and 5%, respectively.
In the seed descriptors, STS showed clear differences among clusters. In Cluster II, 80% of accessions exhibited an intermediate shape and 20% a flattened form; by contrast, Cluster V was characterized by a higher proportion of flattened seeds (70%). With respect to seed SC, only Cluster III contained accessions with white seeds (5.88%) and Cluster VII with pink seeds (3.57%), while violet and purple colorations predominated in varying proportions across the remaining groups (Table S5).

3.7. Selection of Promising Accessions

The scatter diagram (Figure 9, Table S6) illustrates the combined selection of accessions using the BYSI, simultaneously considering two discriminant descriptors closely linked to prolificity attributes due to their relation with traits associated with yield potential. Superior accessions were identified as those exceeding 0.8 g in seed index (SI) and presenting a pod index (PI) below 23.5, located in quadrants II and III of the Cartesian plane.
Figure 9. Bivariate distribution and multigroup selection diagram of promising plants in the germplasm of 113 cacao accessions. The orange rectangle highlights the promising accessions identified within the different clusters.
Based on the selection diagram, 12 accessions with high potential were identified (Table S6), representing 10.62% of the characterized germplasm and distributed across different districts in the province of Bagua and Utcubamba (Table S1). Among these, accessions PER1004084 and PER1004080 stood out, demonstrating a higher average dry seed mass (>1.4 g) and requiring only 21 fruits to yield 1 kg of dry seeds (PI), compared to accession PER1004076 (PI = 23.119). Conversely, PER1004018, although also exhibiting seeds with high dry mass, required more than 25 fruits to produce 1 kg of dry seeds. Furthermore, the desirable traits were distributed across different phenotypic clusters, indicating that yield potential is not confined to a single group but rather dispersed among the accessions within the evaluated germplasm collection.
Finally, the characterization of the germplasm under a permanent environment across two consecutive seasons revealed moderate to high repeatability values ( r ^ > 0.40) for both discriminant descriptors (Table S7). These results highlight that these traits possess intermediate to high heritability magnitude, supporting their reliability for characterization and selection in genetic improvement programs.

3.8. Phytochemical Profile of Selected Cacao Cotyledons

The analysis of bioactive compound content (Figure 10a and Table S8) revealed that among the 15 accessions evaluated, PER1004092 exhibited the highest theobromine content (25.34 ± 0.50 mg/g), whereas PER1004082 had the highest caffeine concentration (5.76 ± 0.22 mg/g). PER1004091 was notable for its epicatechin content (26.98 ± 0.26 mg/g), while catechin was absent in 7 of the 15 accessions studied.
Figure 10. Phytochemical characterization of cacao cotyledons: (a) bioactive compound content; (b) classification of cacao accessions; (c,d) polyphenol and antioxidant contents among accessions, color-coded according to the phenotypic clusters defined in this study, where different letters indicate significant differences among clusters (Tukey’s HSD test; p ≤ 0.05); (e) colorimetric analysis of lyophilized cotyledons in the a* and b* plane; (f) FTIR analysis of functional groups; and (g) Pearson correlation network.
Figure 10b illustrates a heterogeneous genetic structure, with two accessions (PER1004074 and PER1004091) clustering within the Criollo genetic group, whereas the majority (10 out of 15) group within the Forastero clade, encompassing accessions whose phenotypic profiles correspond to the distinct groups defined in this study.
The highest antioxidant and total polyphenol were detected in accessions PER1004091 (DPPH = 35.91 mg TE/g) and PER1004092 (TPC = 77.21 mg GAE/g), respectively (Figure 10c,d; Table S8). On the other hand, Figure 10e reveals a chromatic shift toward reddish–yellowish tones, as indicated by the clustering of lyophilized cotyledons within the positive quadrant of the a* and b* axes in the CIELAB space, albeit with varying intensities.
Figure 10f displays the normalized FTIR spectra of lyophilized cotyledons from 15 accessions, revealing a high degree of spectral similarity and the presence of characteristic bands associated with functional groups such as hydroxyl (O–H), hydrocarbons (C–H), carbonyls (C=O), aromatics and alkenes (C=C), esters, and carbohydrates (C–O). These results underscore the functional potential of the analyzed accessions as fine-flavor cacao.
The Pearson correlation network among phytochemical compounds (Figure 10g; Table S9) reflects significant associations, highlighting strong positive correlations between cyanidin 3-O-glucoside and caffeine (r = 0.86***), as well as between TPC and theobromine (r = 0.95***). Additional positive correlations were observed among epicatechin, color coordinates (a* and b*), and pH, as well as between TA and color. In contrast, strong negative correlations were detected between TPC and color b* (r = −0.92***), and between caffeic acid and caffeine (r = −0.83***).

4. Discussion

4.1. Ex Situ Germplasm Collection Management

The establishment of a germplasm bank of fine-flavor native cacao in the Amazon region of northeastern Peru represents a strategic initiative to safeguard genetic diversity and enhance its effective use in breeding programs aimed at benefiting future generations. This region, acknowledged as one of the centers of origin of cacao, harbors genotypes of high agronomic and commercial value, characterized by their potential for producing functional chocolates, their reduced cadmium uptake and translocation, and their high yield potential associated with significant disease tolerance [,]. The variability encompassed within these native Amazonian cacao genetic resources offers a valuable reservoir of adaptive alleles that can be harnessed to enhance crop productivity and resilience under changing climatic conditions.
In the Amazonas region, recent scientific advances have elucidated the intraspecific relationships of fine-flavor cacao population, revealing a considerable proportion of heterozygous genotypes and a reduced presence of homozygous individuals among accessions from the provinces of Bagua and Utcubamba []. This finding supports the high phenotypic diversity documented in the present germplasm collection, which is of is particularly relevant given that its integration with advanced biotechnological tools provides a promising framework for cacao genetic improvement in Peru. In this context, the incorporation of approaches such as CRISPR-Cas9 based genome editing approaches to discriminate between fine-flavor and bulk cacao, combined with the use of multi-omics platforms, could markedly accelerate the introgression of desirable phenotypic and sensory traits, thereby enhancing breeding efficiency [,,]. Although the adoption of genomic approaches in developing countries continues to face technical and infrastructural constraints, sustained collaboration between local institutions and international research centers remains essential to overcome these challenges and ensure that scientific progress directly benefits smallholder farmers.
As part of strategies for the conservation and the sustainable utilization of terrestrial ecosystems within the framework of the Sustainable Development Goals (SDGs), an ex situ collection of fine-flavor cacao germplasm was successfully established in the Amazonas region, achieving a 100% survival rate due to the implementation of standardized grafting techniques and the strategic use of the IMC 67 clone as rootstock. This result not only confirms the effectiveness of the established protocol but also underscores the agronomic attributes of IMC 67, whose vigorous root system and tolerance to soil-borne pathogens likely enhanced nutrient uptake efficiency without significantly altering the physical or organoleptic characteristics of the scion fruits [,]. Despite its relatively narrow genetic base, the IMC 67 clone has consistently demonstrated performance and high efficiency across different regions of the country, establishing itself as a reliable rootstock for clonal propagation and the germplasm bank establishment. These findings reinforce its strategic importance in cacao conservation and genetic improvement programs in the Amazon region [].

4.2. Phenotyping of Genetic Resources

Understanding the population structure and diversity within the center of origin of cacao is crucial for guiding conservation strategies and promoting the sustainable utilization of native varieties in production systems []. As a perennial, cross-pollinated crop with a diploid genome (2n = 20), cacao exhibits inherently high levels of heterozygosity []. This condition implies that crosses between two plants generate offspring with high genetic variability and a non-uniform distribution of traits, thereby ensuring evolutionary survival and providing opportunities for crop improvement [,].
The expression of both qualitative and quantitative traits in cacao is strongly conditioned by genotype–environment interactions []. In this regard, the Amazonas region stands out as a reservoir of diverse ecotypes, whose reproductive attributes reflect the combined effects of natural evolutionary processes and human-mediated selection []. Although conventional phenotyping is fundamental for defining cacao breeding goals, characterization based solely on external morphological traits has inherent limitations; while valuable for assessing existing diversity, it is inherently subjective and dependent on the evaluator’s perception []. Such constraints reduce its precision as an identification tool, underscoring the importance of integrating genotypic approaches to enhance the delineation and classification of germplasm collections.
Within the evaluated germplasm, leaf characterization allowed the identification of plants exhibiting a larger leaf area, attributed to both their dimensions and a morphology predominantly defined by an apiculate apex with an acute and codiform base. These plants also displayed young leaves with brownish to reddish pigmentation, a distinctive feature during the early stages of leaf development. This phenotypic variation is likely associated with a complex genetic architecture, resulting from the interaction between determinant genes and multiple loci with polygenic effects, which may underlie a multifactorial inheritance pattern linked to adaptive mechanisms and the expression of a higher diversity index [,]. Collectively, these findings align with recent studies demonstrating that the remarkable phenotypic variability of cacao arises from the cumulative effects of long-term natural selection and human management practices historically exerted across diverse Amazonian microenvironments [,].
Cauliflory represents the principal distinguishing feature of the cacao flowering pattern, characterized by the emergence of protandrous flowers directly from the trunk and main branches, forming structures known as floral cushions [,]. In the evaluated germplasm, the remarkable morphological variability at the floral level constitutes a highly valuable trait for characterization and conservation purposes. This diversity is influenced by the interaction between the genotype and the positional context of the flower on the plant, both of which determine its morphology and frequency of occurrence []. In this context, previous studies have demonstrated that anthocyanin pigmentation in floral organs such as the pedicel, staminodes, and ligule exhibits the highest discriminatory power among accessions within cacao germplasm collections [,].
In the analyzed germplasm, anthocyanins were consistently detected in the staminodes, with over 80 accessions exhibiting anthocyanins in the filament, ovary, and ligule. This pigmentation pattern is characteristic of materials originating exclusively the Upper Amazon region of Peru []. The occurrence of this coloration may be explained by the complex interaction between floral morphogenetic traits and the selective pressures imposed by specific pollinators []. Moreover, the observed agromorphological variation in the flower’s traits could be modulated or stabilized through the strategic use of rootstocks, given their substantial influence on the phenotypic expression of the scion. Such an effect could contribute to reducing variability in floral characteristics, thereby enhancing the stability of productive yield in certain accessions [,].
Fruits exhibited marked heterogeneity in both shape and coloration among accessions throughout their developmental and ripening stages. Notably, fruits with pericarp thickness stand out for presenting a lower incidence of Phytophthora palmivora, whereas rougher surfaces were linked to greater susceptibility to the pathogen []. Likewise, fruits characterized by an attenuated apex have been reported to exhibit reduced water accumulation and retention on their surface compared with fruits possessing an obtuse or rounded apex []. Such structural traits serve as valuable indicators for distinguishing accessions that are tolerant or susceptible to Phytophthora species, emphasizing the need to implement comprehensive conservation and characterization strategies aimed at broadening the genetic base of cacao and enhancing its agronomic performance [,]. Collectively, these findings underscore that the phenotypic diversity documented in the evaluated collection represents a strategic reservoir for developing resilient varieties through targeted breeding programs adapted to specific phytosanitary conditions.
For the quantitative descriptors, all fruit-related traits in the collection exhibited high heritability values (>60%) and genotypic coefficients of variation exceeding 10% in 6.6% of the evaluated descriptors, indicating that genetic factors predominantly govern the phenotypic expression of these traits. This finding suggests that the observed variability is largely determined by genetic factors, with lower environmental influence []. In this context, several accessions were distinguished by their superior mean values for fruit length (19.54 cm) and width (9.85 cm), which surpassed the overall means of 16.21 cm and 8.38 cm, respectively, reported for 140 native cacao accessions collected in the Loreto region of Peru [].
The morphological variability of cacao seeds may be associated with the xenia effect, a phenomenon in which pollen from other trees influences traits such as almond size and coloration, resulting in fruits exhibiting either uniform or mixed grain coloration []. However, white seed coloration is typically characteristic of Criollo-type accessions and certain Trinitario hybrids, whereas purple pigmentation predominates in Forastero-type materials [,,]. Additionally, the occurrence of oblong-shaped seeds has been linked to more uniform fermentation during postharvest processing compared with oval or elliptical seeds, thereby exerting a positive influence on flavor profile development [].
Regarding the quantitative traits of the seeds, the evaluated collection exhibited an average pod index of 20.57, which falls within the optimal range for selecting trees that require fewer pods to produce 1 kg of dry seed mass, as they combine cotyledons with an individual mass exceeding 1 g []. This attribute is associated with a higher fat content and a lower shell proportion, factors that contribute positively to crop productivity and bean quality [,]. However, the number of seeds per fruit is largely determined by the number of ovules present in the ovary, as well as the effectiveness of the pollination process []. Therefore, the synergistic evaluation of the pod index with the seed index constitutes highly heritable and decisive parameters for identifying promising accessions with an emphasis on higher yield potential [,,].
The distinctiveness of Peruvian fine-flavor cacao has been supported by genetic matching studies, which identified distinctive multilocus profiles among accessions collected in the Amazonas region, suggesting possible differentiation between genetic groups for the different provinces of the region []. In the present study, multivariate analyses revealed eight clearly defined phenotypic groups, reinforcing the hypothesis of a heritable genetic basis underlying the observed morphological variation. This structural pattern is consistent, with previous studies highlighting value of agromorphological traits as indirect tools for inferring genetic diversity patterns within cacao populations [].
Recent studies conducted in the Amazonas and Loreto regions of northeastern Peru have identified five genetic groups across collections of 146 and 140 accessions, respectively, each exhibiting distinct morphological characteristics [,]. These findings suggest that the inclusion of accessions from different altitudinal gradients enhances the overall genetic diversity within the populations []. However, the morphological variability documented for the same clone across regions highlights the necessity of standardizing the phenotypic criteria employed and of further investigating whether such variations represent novel genetic lineages, subpopulations, or closely related clade []. In this context, it should be emphasized that membership in a phenotypic group does not necessarily indicate superior agronomic performance, underscoring the importance of evaluating each accession individually throughout its production cycle under diverse environmental conditions to ensure its appropriate conservation and utilization.
The concurrence of multiple germplasm accessions within the same cluster, based on their affinity across various descriptors, enables a comprehensive classification that facilitates the identification of agronomically relevant strategies []. Clusters III and II were notable for comprising phenotypically consistent accessions with high reproducibility, distinguished by their greater mean seed mass and lower pod index. These characteristics make them preferential candidates for intra-group selection and clonal propagation due to their superior productive efficiency. In contrast, groups characterized by fruits with favorable biometric proportions but smaller and fewer seeds represent suitable parental materials for inter-group hybridizations aimed at exploiting heterosis and the complementarity of quantitative traits []. Furthermore, groups comprising accessions with advantageous seed and fruit attributes should be prioritized as maternal parental sources, given their larger cotyledon size and lower pod index. Conversely, groups characterized by high fruit load and superior plant health could serve as paternal parents, considering prospective evaluations of postharvest quality to substantiate their potential in breeding programs [,].

4.3. Phytochemical Profile of Fine-Flavor Cacao

Cacao beans are a natural source of methylxanthines, phenolic compounds, and antioxidants, whose moderate consumption enhances physiological functions, reduces stress, and promotes overall well-being beyond their nutritional value. However, elevated levels of methylxanthines intensify bitterness and astringency, partially masking fruity, sweet, and caramel notes that consumers perceive differently according to their preferred sensory profile of cacao [,]. Moreover, the high postharvest stability, bioavailability, and rapid excretion of methylxanthines support their functional efficacy, as demonstrated by their direct correlation with metabolites in biological fluids [,]. This characteristic underscores the potential of the evaluated accessions as reliable sources of bioactive compounds, strategically positioning them within the global fine-flavor cocoa market.
The theobromine/caffeine ratio is a phytochemical marker traditionally used to genetically differentiate cacao into the Criollo, Trinitario, and Forastero groups. Criollo cocoas exhibit values < 2, Trinitario types range between 2–9, and Forastero types display values > 9 []. Ratios below 2 are typically associated with fine-flavor Criollo genotypes, whereas values exceeding 9 are linked to sensory profiles characteristic of bulk cacao []. Nevertheless, recent evidence, consistent with our findings, highlights the potential of Forastero genotypes for fine-flavor cacao production [], thereby redefining the traditional paradigm that had previously confined this category exclusively to Criollo varieties.
Fine-flavor cacao is also characterized by high concentrations of catechin and epicatechin. In this context, recent studies conducted in the province of Utcubamba (Amazonas) reported of only 0.20 mg/g of catechin and 0.50 mg/g of epicatechin [], values markedly lower than the 4.10 mg/g and 7.66 mg/g observed in the PER1004023 accession from the studied collection. Nonetheless, the fermentation process, drive by fungi such as Candida and Aspergillus [], together with Maillard reactions and Strecker degradation occurring during drying and roasting, substantially reduces bioactive compounds while simultaneously promoting the formation of aromatic molecules through the interaction of carbonyl and nitrogenous compounds []. This balance underscores the importance of selecting accessions with a strong capacity to synthesize these metabolites.
FTIR analysis of cacao almonds revealed characteristic signals corresponding to phenols (3562–3322 cm−1, O–H) and aromatic compounds (2925–2854 cm−1, C–H). Consistent with our results, previous studies have reported the presence of these same functional groups in processed almonds [,], confirming the persistence of chemical groups associated with antioxidant capacity. Moreover, the peaks observed between 1750 and 1625 cm−1 in our spectra correspond to the characteristic stretching vibrations of carboxylic acids and aldehydes (C=O), fatty acids (C=C), and proteins (C=O), as similarly reported for Ecuadorian cocoa. The same study associates the signal at 1510 cm−1 with phenols [], the 1726 cm−1 peak to ester carbonyl groups [] and the signal at 3302 cm−1 with alcohols and amines present in compounds such as serotonin. accordance with our findings, recent evidence indicates that cacao concentrates key functional groups within the 4000–1500 cm−1 range, representing a significant contribution to the chemical characterization of fine-flavor cacao [].

5. Conclusions

This study reports the successful establishment and comprehensive characterization of a fine-flavor cacao germplasm collection in the Amazonas region of Peru, whose sustained survival over nearly a decade demonstrates the remarkable adaptability and physiological compatibility between the selected rootstocks and scions.
The broad phenotypic variability and high heritability observed within this collection enabled its classification into eight clearly differentiated groups, consolidating it as a strategic and high-value resource for research focused on cacao genetic improvement. This potential is further supported by the significant differences detected in the phytochemical profiles of promising accessions, which exhibited elevated bioactive potential as determined through complementary HPLC and FTIR analyses. Collectively, these findings highlight the collection as a critical reservoir of agromorphological diversity, essential for the development of high-yielding and functionally superior genotypes, with direct implications for conservation and breeding programs.
While this study provides valuable insights into the phenotypic diversity of cacao, certain limitations should be acknowledged. The pronounced influence of the genotype by environment interaction on the expression of the evaluated descriptors underscores the importance of validating these findings across a broader range of agroecological contexts to identify consistent and stable patterns of variation. Similarly, expanding the geographical scope of sampling beyond the evaluated provinces could reveal additional reservoirs of adaptive variability. Addressing these limitations will not only enhance the selection of resilient and productive genotypes but also strengthen conservation strategies, thereby ensuring the sustainability and competitiveness of Amazonian cacao in the face of climate change challenges and the growing demand for differentiated, high-quality products.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/plants14223536/s1, Figure S1: Germplasm collection of Theobroma cacao L. Figure S2: Hydrometeorological diagram of the studied cropping seasons. Figure S3: Distribution histograms of germplasm based on quantitative descriptors. Figure S4: Optimal number of clusters determined using the silhouette method. Table S1: Collection points. Table S2: Association analysis among quantitative descriptors. Table S3: PCA and eigenvalues of the first twelve dimensions derived from 33 quantitative descriptors in 113 Cacao Accessions. Table S4: Eight phenotypic clusters of cacao based on quantitative descriptors. Table S5: Percentage distribution of frequencies for qualitative descriptors. Table S6: BYSI estimation with the 95% highest posterior density interval for agromorphological yield descriptors in 113 cacao accessions. Table S7: Estimation of repeatability and variance components associated with seed and pod index in 113 cacao accessions from Amazonas, Peru. Table S8: Characterization of phytochemical compounds in cacao cotyledons. Table S9: Pearson correlation values for phytochemical compounds in cacao cotyledons.

Author Contributions

J.J.T.-A.: Conceptualization, statistical analysis, methodology and writing—original draft; N.C.V.-V.: Data curation and funding acquisition; L.A.M.-A.: Investigation and methodology; J.A.P.-Q.: Conceptualization, investigation, and formal analysis; E.F.: Project administration and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported technically and financially by the investment project (IP) with CUI No. 2480490, entitled “Mejoramiento de los servicios de investigación en la caracterización de los recursos genéticos de la agrobiodiversidad en 17 departamentos del Perú—ProAgrobio”, executed by the Subdirectorate of Genetic Resources of the Directorate of Genetic Resources and Biotechnology at the National Institute of Agricultural Innovation (INIA).

Data Availability Statement

Data are contained within the article and Supplementary Materials.

Acknowledgments

We extend our sincere gratitude to engineers Roiber Malqui Ramos and Jheiner Vásquez García, as well as to Technicians Royser Jiménez Tarrillo and Jitler Huaman Abad, whose expertise and dedication were crucial for the optimal management of the germplasm collection and the gathering of agromorphological data from the accessions evaluated in this study.

Conflicts of Interest

The authors declare no financial, personal, or professional conflicts of interest that could have influenced the development or presentation of this work.

References

  1. Boukrouh, S.; Noutfia, A.; Moula, N.; Avril, C.M.; Louvieaux, J.; Hornick, J.L.; Cabaraux, J.; Chentouf, M. Ecological, morpho-agronomical, and bromatological assessment of sorghum ecotypes in Northern Morocco. Sci. Rep. 2023, 13, 15548. [Google Scholar] [CrossRef]
  2. Kouam, E.B.; Kamga-Fotso, A.M.A.; Anoumaa, M. Exploring agro-morphological profiles of Phaseolus vulgaris germplasm shows manifest diversity and opportunities for genetic improvement. J. Agric. Food Res. 2023, 14, 100772. [Google Scholar] [CrossRef]
  3. Alverson, W.S.; Whitlock, B.A.; Nyffeler, R.; Bayer, C.; Baum, D.A. Phylogeny of the core Malvales: Evidence from ndhF sequence data. Am. J. Bot. 1999, 86, 1474–1486. [Google Scholar] [CrossRef]
  4. Motamayor, J.C.; Lanaud, C. Molecular analysis of the origin and domestication of Theobroma cacao L. In Managing Plant Genetic Diversity; Engels, J.M.M., Rao, R., Brown, A.H.D., Jackson, M.T., Eds.; CABI Publishing: Wallingford, UK, 2002; pp. 77–87. [Google Scholar] [CrossRef]
  5. Zarrillo, S.; Gaikwad, N.; Lanaud, C.; Powis, T.; Viot, C.; Lesur, I.; Fouet, O.; Argout, X.; Guichoux, E.; Salin, F.; et al. The use and domestication of Theobroma cacao during the mid-Holocene in the upper Amazon. Nat. Ecol. Evol. 2018, 2, 1879–1888. [Google Scholar] [CrossRef]
  6. Oliva-Cruz, M.; Goñas, M.; Bobadilla, L.G.; Rubio, K.B.; Escobedo-Ocampo, P.; García Rosero, L.M.; Briceño, N.B.R.; Maicelo-Quintana, J.L. genetic groups of fine-aroma native cacao based on morphological and sensory descriptors in northeast Peru. Front. Plant Sci. 2022, 13, 896332. [Google Scholar] [CrossRef]
  7. Oliva-Cruz, M.; Mori-Culqui, P.L.; Caetano, A.C.; Goñas, M.; Vilca-Valqui, N.C.; Chavez, S.G. Total fat content and fatty acid profile of fine-aroma cocoa from northeastern Peru. Front. Nutr. 2021, 8, 677000. [Google Scholar] [CrossRef]
  8. Wickramasuriya, A.M.; Dunwell, J.M. Cacao biotechnology: Current status and future prospects. Plant Biotechnol. J. 2018, 16, 4–17. [Google Scholar] [CrossRef] [PubMed]
  9. Vrushali, B. Chocolate Market Size and Outlook, 2025–2033. 2025. Available online: https://straitsresearch.com/report/chocolate-market (accessed on 15 October 2025).
  10. Díaz-Valderrama, J.R.; Leiva-Espinoza, S.T.; Aime, M.C. The history of cacao and its diseases in the Americas. Phytopathology 2020, 110, 1604–1619. [Google Scholar] [CrossRef] [PubMed]
  11. Zhang, D.; Martínez, W.J.; Johnson, E.S.; Somarriba, E.; Phillips-Mora, W.; Astorga, C.; Mischke, S.; Meinhardt, L.W. Genetic diversity and spatial structure in a new distinct Theobroma cacao L. population in Bolivia. Genet. Resour. Crop Evol. 2012, 59, 239–252. [Google Scholar] [CrossRef]
  12. Motamayor, J.C.; Lachenaud, P.; Mota, J.W.S.; Loor, R.; Kuhn, D.N.; Brown, J.S.; Schnell, R.J. Geographic and genetic population differentiation of the amazonian chocolate tree (Theobroma cacao L). PLoS ONE 2008, 3, e3311. [Google Scholar] [CrossRef]
  13. Thomas, E.; Zonneveld, M.V.; Loo, J.; Hodgkin, T.; Galluzzi, G.; Etten, J.V. Present spatial diversity patterns of Theobroma cacao L. in the Neotropics reflect genetic differentiation in Pleistocene refugia followed by human-Influenced dispersal. PLoS ONE 2012, 7, e47676. [Google Scholar] [CrossRef]
  14. INDECOPI. Denominación de Origen Cacao Amazonas Perú. 2016. Available online: https://www.gob.pe/institucion/indecopi/informes-publicaciones/5227603-denominacion-de-origen-cacao-amazonas-peru-2016 (accessed on 17 July 2025).
  15. ICCO. Panel Recognizes 23 Countries as Fine and Flavour COCOA Exporters. 2016. Available online: https://www.icco.org/icco-panel-recognizes-23-countries-as-fine-and-flavour-cocoa-exporters/ (accessed on 17 July 2025).
  16. Pridmore, R.D.; Crouzillat, D.; Walker, C.; Foley, S.; Zink, R.; Zwahlen, M.C.; Brüssow, H.; Pétiard, V.; Mollet, B. Genomics, molecular genetics and the food industry. J. Biotechnol. 2000, 78, 251–258. [Google Scholar] [CrossRef]
  17. González Castro, J.B.; Torres Armas, E.A. Caracterización de Productores en la Cadena de Valor del Cacao Fino de Aroma de Amazonas. Bachelor’s Thesis, Universidad San Pedro, Chimbote, Perú, 2018. Available online: https://alicia.concytec.gob.pe/vufind/Record/REVUSANP_562b4608644571802674d0f35b5654ab (accessed on 17 July 2025).
  18. Doaré, F.; Ribeyre, F.; Cilas, C. Genetic and environmental links between traits of cocoa beans and pods clarify the phenotyping processes to be implemented. Sci. Rep. 2020, 10, 9888. [Google Scholar] [CrossRef]
  19. Ramanatha Rao, V.; Hodgkin, T. Genetic diversity and conservation and utilization of plant genetic resources. Plant Cell Tiss Organ Cult. 2002, 68, 1–19. [Google Scholar] [CrossRef]
  20. Adenuga, O.O.; Ariyo, O.J. Diversity Analysis of Cacao (Theobroma cacao) Genotypes in Nigeria based on juvenile phenotypic plant traits. Int. J. Fruit Sci. 2020, 20, S1348–S1359. [Google Scholar] [CrossRef]
  21. Oliva-Cruz, M.; Goñas, M.; García, L.M.; Rabanal-Oyarse, R.; Alvarado-Chuqui, C.; Escobedo-Ocampo, P.; Maicelo-Quintana, J.L. Phenotypic characterization of fine-aroma cocoa from northeastern Peru. Int. J. Agron. 2021, 2021, 2909909. [Google Scholar] [CrossRef]
  22. Ibrahim Bio Yerima, A.R.; Achigan-Dako, E.G.; Aissata, M.; Sekloka, E.; Billot, C.; Adje, C.O.A.; Barnaud, A.; Bakasso, Y. Agromorphological characterization revealed three phenotypic groups in a region-wide germplasm of fonio (Digitaria exilis (Kippist) Stapf) from West Africa. Agronomy 2020, 10, 1653. [Google Scholar] [CrossRef]
  23. Paredes-Espinosa, R.; Gutiérrez-Reynoso, D.L.; Atoche-Garay, D.; Mansilla-Córdova, P.J.; Abad-Romaní, Y.; Girón-Aguilar, C.; Flores-Torres, I.; Montañez-Artica, A.G.; Arbizu, C.I.; Guerra, C.A.A.; et al. Agro-morphological characterization and diversity analysis of Coffea arabica germplasm collection from INIA, Peru. Crop Sci. 2023, 63, 2877–2893. [Google Scholar] [CrossRef]
  24. Yeshitila, M.; Gedebo, A.; Tesfaye, B.; Degu, H.D. Agro-morphological genetic diversity assessment of Amaranthus genotypes from Ethiopia based on qualitative traits. CABI Agric. Biosci. 2024, 5, 95. [Google Scholar] [CrossRef]
  25. Baudoin Wouokoue, T.J.; Fouelefack, F.R.; Leticia Liejip, N.C.; Biakdjolbo, W.E.; Mafouo, T.E. Morphoagronomic and phenological characteristics of roselle (Hibiscus sabdariffa L.) grown in Sudano-Sahelian zone of Cameroon. Int. J. Agron. 2025, 2025, 5568972. [Google Scholar] [CrossRef]
  26. Hachef, A.; Bourguiba, H.; Cherif, E.; Ivorra, S.; Terral, J.F.; Zehdi-Azouzi, S. Agro-morphological traits assessment of Tunisian male date palms (Phœnix dactylifera L.) for preservation and sustainable utilization of local germplasm. Saudi J. Biol. Sci. 2023, 30, 103574. [Google Scholar] [CrossRef] [PubMed]
  27. Virga, G.; Licata, M.; Consentino, B.B.; Tuttolomondo, T.; Sabatino, L.; Leto, C.; La Bella, S. Agro-morphological characterization of sicilian chili pepper accessions for ornamental purposes. Plants 2020, 9, 1400. [Google Scholar] [CrossRef] [PubMed]
  28. Bekele, F.; Butler, D.R. Proposed short list of cocoa descriptors for characterization. In Working Procedures for Cocoa Germplasm Evaluation and Selection, Proceedings of the CFC/ICCO/IPGRI Project Workshop, Montpellier, France, 1–6 February 1998; Eskes, A.B., Engels, J.M.M., Lass, R.A., Eds.; Bioversity International: Rome, Italy, 2000; Available online: https://www.cabidigitallibrary.org/doi/full/10.5555/20001611520 (accessed on 22 July 2025).
  29. MIDAGRI. Banco de Germoplasma del Instituto Nacional de Innovación Agraria. 2025. Available online: https://genebankperu.inia.gob.pe/ (accessed on 28 June 2025).
  30. Vásquez-García, J.; Santos-Pelaez, J.C.; Malqui-Ramos, R.; Vigo, C.N.; C, W.A.; Bobadilla, L.G. Agromorphological characterization of cacao (Theobroma cacao L.) accessions from the germplasm bank of the National Institute of Agrarian Innovation, Peru. Heliyon 2022, 8, e10888. [Google Scholar] [CrossRef]
  31. NTP-ISO/IEC 17025:2017; Requisitos Generales para la Competencia de los Laboratorios de Ensayo y Calibración. Dirección de Normalización—INACAL: Lima, Peru, 2017.
  32. MINAM. Servicio Nacional de Meteorología e Hidrología del Perú—Datos hidrometeorológicos a nivel nacional. 2025. Available online: https://www.senamhi.gob.pe/?&p=estaciones (accessed on 10 June 2025).
  33. Bekele, F.L.; Bidaisee, G.G.; Singh, H.; Saravanakumar, D. Morphological characterisation and evaluation of cacao (Theobroma cacao L.) in Trinidad to facilitate utilisation of Trinitario cacao globally. Genet. Resour. Crop Evol. 2020, 67, 621–643. [Google Scholar] [CrossRef]
  34. Imán Correa, S.A.; Samanamud Curto, A.F.; Paredes Meneses, C.; Chuquizuta Del Castillo, B.; Arévalo Pinedo, M.T. Descriptores para cacao; INIA: Lima, Perú, 2024; Available online: https://hdl.handle.net/20.500.12955/2457 (accessed on 18 August 2025).
  35. Compañía Nacional de Chocolates S.A.S. Protocolo para la Caracterización Morfológica de Árboles Élite de Cacao (Teobroma cacao L.); Chocolates: Medellín, Colombia, 2018; Available online: https://share.google/vqNDpD2FAZxWIxha4 (accessed on 19 August 2025).
  36. Iwaro, A.D.; Bekele, F.L.; Butler, D.R. Evaluation and utilisation of cacao (Theobroma cacao L.) germplasm at the International Cocoa Genebank, Trinidad. Euphytica 2003, 130, 207–221. [Google Scholar] [CrossRef]
  37. Cortez, D.; Flores, M.; Calampa, L.L.; Oliva-Cruz, M.; Goñas, M.; Meléndez-Mori, J.B.; Chavez, S.G. From the seed to the cocoa liquor: Traceability of bioactive compounds during the postharvest process of cocoa in Amazonas-Peru. Microchem. J. 2024, 201, 110607. [Google Scholar] [CrossRef]
  38. Brand-Williams, W.; Cuvelier, M.E.; Berset, C. Use of a free radical method to evaluate antioxidant activity. LWT-Food Sci. Technol. 1995, 28, 25–30. [Google Scholar] [CrossRef]
  39. Singleton, V.L.; Rossi, J.A. Colorimetry of total phenolics with phosphomolybdic-phosphotungstic acid reagents. Am. J. Enol. Vitic. 1965, 16, 144–158. [Google Scholar] [CrossRef]
  40. Cortez, D.; Quispe-Sanchez, L.; Mestanza, M.; Oliva-Cruz, M.; Yoplac, I.; Torres, C.; La Bella, S. Changes in bioactive compounds during fermentation of cocoa (Theobroma cacao) harvested in Amazonas-Peru. Curr. Res. Food Sci. 2023, 6, 100494. [Google Scholar] [CrossRef]
  41. R Core Team. R: A Language and Environment for Statistical Computing; R Foundation for Statistical Computing: Viena, Austria, 2024; Available online: https://www.r-project.org/ (accessed on 11 August 2025).
  42. Komsta, L. Outliers: Tests for Outliers. 2022. Available online: https://cran.r-project.org/web/packages/outliers (accessed on 11 August 2025).
  43. Comtois, D. summarytools: Tools to Quickly and Neatly Summarize Data. 2025. Available online: https://cran.r-project.org/web/packages/summarytools (accessed on 11 August 2025).
  44. Wickham, H. ggplot2: Elegant Graphics for Data Analysis; Springer: New York, NY, USA, 2016; Available online: https://ggplot2.tidyverse.org (accessed on 11 August 2025).
  45. Popat, R.; Patel, R.; Parmar, D. variability: Genetic Variability Analysis for Plant Breeding Research. 2020. Available online: https://cran.r-project.org/web/packages/variability (accessed on 11 August 2025).
  46. Harrell, F.E., Jr. Hmisc: Harrell Miscellaneous. 2025. Available online: https://cran.r-project.org/web/packages/Hmisc (accessed on 11 August 2025).
  47. Husson, F.; Josse, J.; Le, S.; Mazet, J. FactoMineR: Multivariate Exploratory Data Analysis and Data Mining. 2024. Available online: https://cran.r-project.org/web/packages/FactoMineR (accessed on 15 October 2025).
  48. Kassambara, A.; Mundt, F. factoextra: Extract and Visualize the Results of Multivariate Data Analyses. 2020. Available online: https://cran.r-project.org/web/packages/factoextra (accessed on 11 August 2025).
  49. Govindaraj, M.; Yadav, O.P.; Rajpurohit, B.S.; Kanatti, A.; Rai, K.N.; Dwivedi, S.L. Genetic variability, diversity and interrelationship for twelve grain minerals in 122 commercial pearl millet cultivars in India. Agric. Res. 2020, 9, 516–525. [Google Scholar] [CrossRef]
  50. Lezzoni, A.F.; Pritts, M.P. Applications of principal component analysis to horticultural research. HortScience 1991, 26, 334–338. [Google Scholar] [CrossRef]
  51. Maechler, M.; Rousseeuw, P.; Hubert, M.; Hornik, K.; Schubert, E. cluster: Finding Groups in Data—Cluster Analysis Extended. 2025. Available online: https://cran.r-project.org/web/packages/cluster (accessed on 11 August 2025).
  52. Suzuki, R.; Terada, Y.; Shimodaira, H. pvclust: Hierarchical Clustering with p-Values via Multiscale Bootstrap Resampling. 2019. Available online: https://cran.r-project.org/web/packages/pvclust (accessed on 15 October 2025).
  53. Gu, Z.; Hübschmann, D. Make interactive complex heatmaps in R. Bioinformatics 2022, 38, 1460–1462. [Google Scholar] [CrossRef] [PubMed]
  54. Galili, T. dendextend: Extending ‘dendrogram’ Functionality in R. 2025. Available online: https://cran.r-project.org/web/packages/dendextend (accessed on 11 August 2025).
  55. Wu, M.; Wang, S. Simultaneous optimal estimates of fixed effects and variance components in the mixed model. Sci. China Ser. A-Math. 2004, 47, 787–799. [Google Scholar] [CrossRef]
  56. Mendiburu, F. agricolae: Statistical Procedures for Agricultural Research. 2023. Available online: https://cran.r-project.org/web/packages/agricolae (accessed on 11 August 2025).
  57. Hadfield, J.D. MCMC methods for multi-response generalized linear mixed models: The MCMCglmm R package. J. Stat. Softw. 2010, 33, 1–22. [Google Scholar] [CrossRef]
  58. Espitia-Negrete, L.; Orozco-Orozco, L.F.; Torres, J.M.C.; Medina-Cano, C.I.; Grisales-Vasquez, N.Y. Fiber production repeatability and selection of promising fique (Furcraea spp.) genotypes. Crop Sci. 2024, 64, 2666–2678. [Google Scholar] [CrossRef]
  59. Plummer, M.; Best, N.; Cowles, K.; Vines, K. CODA: Convergence diagnosis and output analysis for MCMC. R News 2006, 6, 7–11. [Google Scholar]
  60. Wickham, H. tidyverse: Easily Install and Load the “Tidyverse”. 2023. Available online: https://cran.r-project.org/web/packages/tidyverse/ (accessed on 11 August 2025).
  61. Pedersen, T.L. ggforce: Accelerating “ggplot2”. 2025. Available online: https://cran.r-project.org/web/packages/ggforce (accessed on 11 August 2025).
  62. Hanson, B.A. Exploratory Chemometrics for Spectroscopy. 2025. Available online: https://bryanhanson.github.io/ChemoSpec/ (accessed on 11 August 2025).
  63. Meléndez-Mori, J.B.; Guerrero-Abad, J.C.; Tejada-Alvarado, J.J.; Ayala-Tocto, R.Y.; Oliva, M. Genotypic variation in cadmium uptake and accumulation among fine-aroma cacao genotypes from northern Peru: A model hydroponic culture study. Environ. Pollut. Bioavailab. 2023, 35, 2287710. [Google Scholar] [CrossRef]
  64. Tineo, D.; Bustamante, D.E.; Calderon, M.S.; Oliva, M. Comparative analyses of chloroplast genomes of Theobroma cacao from northern Peru. PLoS ONE 2025, 20, e0316148. [Google Scholar] [CrossRef]
  65. Scharf, A.; Lang, C.; Fischer, M. Genetic authentication: Differentiation of fine and bulk cocoa (Theobroma cacao L.) by a new CRISPR/Cas9-based in vitro method. Food Control 2020, 114, 107219. [Google Scholar] [CrossRef]
  66. Fister, A.S.; Landherr, L.; Maximova, S.N.; Guiltinan, M.J. Transient expression of CRISPR/Cas9 machinery targeting TcNPR3 enhances defense response in Theobroma cacao. Front. Plant Sci. 2018, 9, 268. [Google Scholar] [CrossRef]
  67. Chaturvedi, P.; Pierides, I.; Zhang, S.; Schwarzerova, J.; Ghatak, A.; Weckwerth, W. Multiomics for Crop Improvement. In Sustainability Sciences in Asia and Africa; Pandey, M.K., Bentley, A., Desmae, H., Roorkiwal, M., Varshney, R.K., Eds.; Springer Nature: Singapore, 2024; pp. 107–141. [Google Scholar] [CrossRef]
  68. Imán, S.A.; Samanamud, A.F.; Ramirez, J.F.; Cobos, M.; Paredes, C.; Castro, J.C. Development and phenotypic characterization of a native Theobroma cacao L. germplasm bank from the Loreto region of the Peruvian Amazon: Implications for Ex situ conservation and genetic improvement. Front. Conserv. Sci. 2025, 6, 1576239. [Google Scholar] [CrossRef]
  69. Schmidt, J.E.; DuVal, A.; Puig, A.; Tempeleu, A.; Crow, T. Interactive and dynamic effects of rootstock and rhizobiome on scion nutrition in cacao seedlings. Front. Agron. 2021, 3, 754646. [Google Scholar] [CrossRef]
  70. Bustamante, D.E.; Motilal, L.A.; Calderon, M.S.; Mahabir, A.; Oliva, M. Genetic diversity and population structure of fine aroma cacao (Theobroma cacao L.) from north Peru revealed by single nucleotide polymorphism (SNP) markers. Front. Ecol. Evol. 2022, 10, 895056. [Google Scholar] [CrossRef]
  71. Bekele, F.; Phillips-Mora, W. Cacao (Theobroma cacao L.) Breeding. In Advances in Plant Breeding Strategies: Industrial and Food Crops; Al-Khayri, J., Jain, S., Johnson, D., Eds.; Springer: Cham, Switzerland, 2019; pp. 409–487. [Google Scholar] [CrossRef]
  72. Mixão, V.; Nunez-Rodriguez, J.C.; del Olmo, V.; Ksiezopolska, E.; Saus, E.; Boekhout, T.; Gacser, A.; Gabaldón, T. Evolution of loss of heterozygosity patterns in hybrid genomes of Candida yeast pathogens. BMC Biol. 2023, 21, 105. [Google Scholar] [CrossRef] [PubMed]
  73. Jombart, T.; Devillard, S.; Balloux, F. Discriminant analysis of principal components: A new method for the analysis of genetically structured populations. BMC Genet. 2010, 11, 94. [Google Scholar] [CrossRef]
  74. Utama, R.F.; Gustian; Efendi, S. Morphological characterization of cacao plants (Theobroma cacao L.) from Dharmasraya Regency of West Sumatra. Celebes Agric. 2023, 4, 30–45. [Google Scholar] [CrossRef]
  75. Motamayor, J.C.; Risterucci, A.M.; Lopez, P.A.; Ortiz, C.F.; Moreno, A.; Lanaud, C. Cacao domestication I: The origin of the cacao cultivated by the Mayas. Heredity 2002, 89, 380–386. [Google Scholar] [CrossRef] [PubMed]
  76. Tlahig, S.; Mohamed, A.; Triki, T.; Yahia, Y.; Yehmed, J.; Yahia, H.; Guasmi, F.; Loumerem, M. Integrated agro-morphological and molecular characterization for progeny testing to enhance alfalfa breeding in arid regions of Tunisia. J. Agric. Food Res. 2025, 20, 101793. [Google Scholar] [CrossRef]
  77. Bidot Martínez, I.; Valdés de la Cruz, M.; Riera Nelson, M.; Bertin, P. Morphological characterization of traditional cacao (Theobroma cacao L.) plants in Cuba. Genet. Resour. Crop Evol. 2017, 64, 73–99. [Google Scholar] [CrossRef]
  78. Poorter, H.; Niklas, K.J.; Reich, P.B.; Oleksyn, J.; Poot, P.; Mommer, L. Biomass allocation to leaves, stems and roots: Meta-analyses of interspecific variation and environmental control. New Phytol. 2012, 193, 30–50. [Google Scholar] [CrossRef]
  79. Jaramillo, M.A.; Reyes-Palencia, J.; Jiménez, P. Floral biology and flower visitors of cocoa (Theobroma cacao L.) in the upper Magdalena Valley, Colombia. Flora 2024, 313, 152480. [Google Scholar] [CrossRef]
  80. Lim, S.; Baek, I.; Hong, S.M.; Lee, Y.; Kirubakaran, S.; Kim, M.S.; Meinhardt, L.W.; Park, S.; Ahn, E. Cacao floral traits are shaped by the interaction of flower position with genotype. Heliyon 2025, 11, e42407. [Google Scholar] [CrossRef]
  81. Osorio-Guarín, J.A.; Berdugo-Cely, J.; Coronado, R.A.; Zapata, Y.P.; Quintero, C.; Gallego-Sánchez, G.; Yockteng, R. Colombia a source of cacao genetic diversity as revealed by the population structure analysis of germplasm bank of Theobroma cacao L. Front. Plant Sci. 2017, 8, 1994. [Google Scholar] [CrossRef]
  82. Trunschke, J.; Lunau, K.; Pyke, G.H.; Ren, Z.X.; Wang, H. Flower color evolution and the evidence of pollinator-mediated selection. Front. Plant Sci. 2021, 12, 617851. [Google Scholar] [CrossRef]
  83. Rodriguez-Medina, C.; Arana, A.C.; Sounigo, O.; Argout, X.; Alvarado, G.A.; Yockteng, R. Cacao breeding in Colombia, past, present and future. Breed. Sci. 2019, 69, 373–382. [Google Scholar] [CrossRef]
  84. Muzuni; Ambardini, S.; Widyaningsih, A.S.; Ismaun. Morphological and physiological characteristics of cocoa (Theobroma cacao L. var. criollo) infected by Phytophthora palmivora in cocoa plantations in southeast Sulawesi Indonesia. AIP Conf. Proc. 2023, 2704, 020015. [Google Scholar] [CrossRef]
  85. Daymond, A.J.; Hadley, P. Differential effects of temperature on fruit development and bean quality of contrasting genotypes of cacao (Theobroma cacao). Ann. Appl. Biol. 2008, 153, 175–185. [Google Scholar] [CrossRef]
  86. Ten Hoopen, G.M.; Deberdt, P.; Mbenoun, M.; Cilas, C. Modelling cacao pod growth: Implications for disease control. Ann. Appl. Biol. 2012, 160, 260–272. [Google Scholar] [CrossRef]
  87. Nieves-Orduña, H.E.; Müller, M.; Krutovsky, K.V.; Gailing, O. Genotyping of cacao (Theobroma cacao L.) germplasm resources with SNP markers linked to agronomic traits reveals signs of selection. Tree Genet. Genomes 2024, 20, 13. [Google Scholar] [CrossRef]
  88. Suparno, A.; Arbianto, M.A.; Prabawardani, S.; Chadikun, P.; Tata, H.; Luhulima, F.D.N. The identification of yield components, genetic variability, and heritability to determine the superior cocoa trees in west Papua, Indonesia. Biodiversitas 2024, 25, 2363–2373. [Google Scholar] [CrossRef]
  89. López-Hernández, J.A.; Ortiz-Mejía, F.N.; Parada-Berríos, F.A.; Lara-Ascencio, F.; Vásquez-Osegueda, E.A. Caracterización morfoagronómica de cacao criollo (Theobroma cacao L.) y su incidencia en la selección de germoplasma promisorio en áreas de presencia natural en El Salvador. Rev. Minerva 2019, 2, 31–50. [Google Scholar] [CrossRef]
  90. Lachenaud, P.; Motamayor, J.C. The Criollo cacao tree (Theobroma cacao L.): A review. Genet. Resour. Crop Evol. 2017, 64, 1807–1820. [Google Scholar] [CrossRef]
  91. Afifah, E.N.; Sari, I.A.; Susilo, A.W.; Malik, A.; Fukusaki, E.; Putri, S.P. Characterization of fine-flavor cocoa in parent-hybrid combinations using metabolomics approach. Food Chem. X 2024, 24, 101832. [Google Scholar] [CrossRef] [PubMed]
  92. Kongor, J.E.; Hinneh, M.; de Walle, D.V.; Afoakwa, E.O.; Boeckx, P.; Dewettinck, K. Factors influencing quality variation in cocoa (Theobroma cacao) bean flavour profile—A review. Food Res. Int. 2016, 82, 44–52. [Google Scholar] [CrossRef]
  93. Avendaño-Arrazate, C.H.; Martínez-Bolaños, M.; Reyes-Reyes, A.L.; Aragón-Magadán, M.A.; Reyes-López, D.; López-Morales, F. Genotype-environment interaction of genotypes of cocoa in Mexico. Sci. Rep. 2025, 15, 15399. [Google Scholar] [CrossRef]
  94. Velasquez-Vasconez, P.A.; Castro-Zambrano, M.I.; Rodríguez-Cabal, H.A.; Castro, D.; Arbelaez, L.; Zambrano, J.C. Exploring Theobroma grandiflorum diversity to improve sustainability in smallholdings across Caquetá, Colombia. Int. J. Agron. 2025, 20, 100034. [Google Scholar] [CrossRef]
  95. Falque, M.; Vincent, A.; Vaissiere, B.E.; Eskes, A.B. Effect of pollination intensity on fruit and seed set in cacao (Theobroma cacao L.). Sex. Plant Reprod. 1995, 8, 354–360. [Google Scholar] [CrossRef]
  96. Fajardo, J.G.B.; Tellez, H.B.H.; Atuesta, G.C.P.; Aldana, A.P.S.; Arteaga, J.J.M. Antioxidant activity, total polyphenol content and methylxantine ratio in four materials of Theobroma cacao L. from Tolima, Colombia. Heliyon 2022, 8, e09402. [Google Scholar] [CrossRef]
  97. Akoa, S.P.; Onomo, P.E.; Ndjaga, J.M.; Ondobo, M.L.; Djocgoue, P.F. Impact of pollen genetic origin on compatibility, agronomic traits, and physicochemical quality of cocoa (Theobroma cacao L.) beans. Sci Hortic. 2021, 287, 110278. [Google Scholar] [CrossRef]
  98. Kulesza, E.; Hopfer, H.; Ziegler, G.R.; Calle-Bellido, J.; Roberts, C.; Umaharan, P.; Maximova, S.N.; Guiltinan, M.J. Correction: The effect of maternal and paternal genotype on seed lipid content, composition, and thermal traits in Theobroma cacao. Trop. Plant Biol. 2025, 18, 49. [Google Scholar] [CrossRef]
  99. Tušek, K.; Valinger, D.; Jurina, T.; Sokač Cvetnić, T.; Gajdoš Kljusurić, J.; Benković, M. Bioactives in cocoa: Novel findings, health benefits, and extraction techniques. Separations 2024, 11, 128. [Google Scholar] [CrossRef]
  100. Martins, L.M.; Santana, L.R.R.; Maciel, L.F.; Soares, S.E.; Ferreira, A.C.R.; Biasoto, A.C.T.; Bispo, E.d.S. Phenolic compounds, methylxanthines, and preference drivers of dark chocolate made with hybrid cocoa beans. Res. Soc. Dev. 2023, 12, e22912440782. [Google Scholar] [CrossRef]
  101. Oracz, J.; Nebesny, E.; Zyzelewicz, D.; Budryn, G.; Luzak, B. Bioavailability and metabolism of selected cocoa bioactive compounds: A comprehensive review. Crit. Rev. Food Sci. Nutr. 2020, 60, 1947–1985. [Google Scholar] [CrossRef]
  102. Lima, G.V.S.; Moura, F.G.; Gofflot, S.; Pinto, A.S.O.; de Souza, J.N.S.; Baeten, V.; Rogez, H. Targeted metabolomics for quantitative assessment of polyphenols and methylxanthines in fermented and unfermented cocoa beans from 18 genotypes of the Brazilian Amazon. Food Res. Int. 2025, 211, 116394. [Google Scholar] [CrossRef] [PubMed]
  103. Zapata-Alvarez, A.; Bedoya-Vergara, C.; Porras-Barrientos, L.D.; Rojas-Mora, J.M.; Rodríguez-Cabal, H.A.; Gil-Garzon, M.A.; Martinez-Alvarez, O.L.; Ocampo-Arango, C.M.; Ardila-Castañeda, M.P.; Monsalve-F, Z.I. Molecular, biochemical, and sensorial characterization of cocoa (Theobroma cacao L.) beans: A methodological pathway for the identification of new regional materials with outstanding profiles. Heliyon 2024, 10, e24544. [Google Scholar] [CrossRef]
  104. Collin, S.; Fisette, T.; Pinto, A.; Souza, J.; Rogez, H. Discriminating aroma compounds in five cocoa bean genotypes from two Brazilian States: White Kerosene-like Catongo, Red Whisky-like FL89 (Bahia), Forasteros IMC67, PA121 and P7 (Pará). Molecules 2023, 28, 1548. [Google Scholar] [CrossRef]
  105. Mat, S.A.; Mohd Daud, I.S.; Mohamad Rojie, M.H.; Hussain, N.; Rukayadi, Y. Effects of Candida sp. and Blastobotrys sp. starter on fermentation of cocoa (Theobroma cacao L.) beans and its antibacterial activity. J. Pure Appl. Microbiol. 2016, 10, 2501–2510. [Google Scholar] [CrossRef]
  106. Batista, N.N.; de Andrade, D.P.; Ramos, C.L.; Dias, D.R.; Schwan, R.F. Antioxidant capacity of cocoa beans and chocolate assessed by FTIR. Food Res. Int. 2016, 90, 313–319. [Google Scholar] [CrossRef]
  107. Deus, V.L.; Resende, L.M.; Bispo, E.S.; Franca, A.S.; Gloria, M.B.A. FTIR and PLS-regression in the evaluation of bioactive amines, total phenolic compounds and antioxidant potential of dark chocolates. Food Chem. 2021, 357, 129754. [Google Scholar] [CrossRef]
  108. Grillo, G.; Boffa, L.; Binello, A.; Mantegna, S.; Cravotto, G.; Chemat, F.; Dizhbite, T.; Lauberte, L.; Telysheva, G. Analytical dataset of Ecuadorian cocoa shells and beans. Data Brief 2019, 22, 56–64. [Google Scholar] [CrossRef]
  109. Santiago-Gómez, I.; Carrera-Lanestosa, A.; González-Alejo, F.A.; Guerra-Que, Z.; García-Alamilla, R.; Rivera-Armenta, J.L.; García-Alamilla, P. Pectin extraction process from cocoa pod husk (Theobroma cacao L.) and characterization by fourier transform infrared spectroscopy. ChemEngineering 2025, 9, 25. [Google Scholar] [CrossRef]
  110. Betancourt-Sambony, F.; Barrios-Rodríguez, Y.F.; Medina-Orjuela, M.E.; Gutiérrez-Guzmán, N.; Amorocho-Cruz, C.M.; Carranza, C.; Girón-Hernández, J. Relationship between physicochemical properties of roasted cocoa beans and climate patterns: Quality and safety implications. LWT-Food Sci. Technol. 2025, 216, 117320. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.