3. Results
The 32 soybean genotypes were subjected to a PCA–K-means analysis to separate the genotypes according to their amino acid contents and spectral behavior. This multivariate analysis enabled the separation of the genotypes into four distinct groups (
Figure 4). The largest number of genotypes was grouped in cluster 1, totaling 11 genotypes, while cluster 2 contained the fewest genotypes, with five materials.
Aspartic acid, glutamic acid, alanine, arginine, cystine, phenylalanine, glycine, and histidine contents were compared according to the Scott–Knott clustering to observe the highest and lowest levels of each amino acid within each group (
Figure 5). Cluster 1 (C1) showed higher levels of aspartic acid, glutamic acid, alanine, cystine, and phenylalanine. Cluster 2 (C2) was characterized by the lowest concentrations of aspartic acid, glutamic acid, alanine, cystine, and phenylalanine, and exhibited higher levels of arginine and glycine. Cluster 3 (C3) presented intermediate levels for aspartic acid, glutamic acid, alanine, arginine, cystine, phenylalanine, glycine, and histidine. Cluster 4 (C4) showed higher levels only for aspartic acid, alanine, and cystine. Histidine levels were similar across all groups.
The amino acids isoleucine, leucine, lysine, proline, tyrosine, threonine, and valine exhibited higher concentrations in the C1 genotypes (
Figure 6). C2 showed the lowest levels of isoleucine, leucine, lysine, proline, tyrosine, and valine, but presented an elevated methionine content. C3 exhibited intermediate levels of isoleucine, leucine, proline, serine, threonine, and valine, as well as high levels of lysine and tyrosine. C4 displayed high concentrations of lysine, methionine, tyrosine, and threonine, and intermediate concentrations of isoleucine, leucine, proline, and valine.
Regarding the spectral bands, each group exhibited a specific pattern across bands B1–B14, with C4 showing the highest reflectance in all bands, followed by C1. Overall, C3 had the lowest reflectance, and there was no difference between C2 and C3 in bands B7–B10 (
Figure 7). Groups C1 and C2 displayed similar behavior in bands B12 and B13.
For bands B15–B28, cluster C4 continued to exhibit the highest reflectance, followed by C1, C3, and C2. Between B17 and B20, C2 and C3 showed similar spectral behavior. For bands B21–B26 and B28, C1 and C2 displayed the same spectral pattern (
Figure 8).
The Pearson correlations exhibited distinct patterns within each group for the relationships between amino acids and spectral bands (
Figure 9). Correlations among spectral bands were positive across all groups. In group 1, arginine and proline showed a strong negative correlation, while serine, alanine, tyrosine, threonine, cystine, isoleucine, glutamine, and lysine demonstrated strong positive correlations among themselves. The correlations between spectral bands and amino acids in this group revealed some noteworthy results above −0.20: phenylalanine showed negative correlations greater than 0.20, with all 28 bands; aspartic acid showed negative correlations with all bands except B4 and B5; arginine showed negative correlations with bands B21–B26 and B28; glycine showed a negative correlation with B27; lysine showed negative correlations with B7–B10; serine and tyrosine showed negative correlations with B5.
In group 2, more and stronger correlations were observed compared to group 1. Glutamic acid showed positive correlations with bands B12–B14 and B22–B23; alanine showed negative correlations with all bands, with the strongest correlation at B9; arginine reached −0.20 with bands B16 and B17; cystine showed positive correlations with B27 and B28; histidine showed positive correlations with B11–B14 and B20–B26, ranging from 0.21 to 0.39, and a negative correlation with B27; lysine showed negative correlations with B5, B6–B10, and B27; methionine showed positive correlations with B1, B11–B14, B19–B26, and B28; proline also exhibited positive correlations with B1–B11, B14–B18, B24, and B26–B28; serine showed negative correlations with B5 and B7–B10; tyrosine showed negative correlations with B11–B14; and valine showed a correlation of 0.24 with B28.
In group 3, the following correlations among amino acids were observed: aspartic acid showed strong positive correlations with alanine, serine, tyrosine, and threonine; glutamic acid showed strong positive correlations with phenylalanine, isoleucine, leucine, proline, and valine; alanine showed strong positive correlations with aspartic acid, cystine, serine, tyrosine, and threonine; arginine correlated positively with tyrosine; cystine showed strong positive correlations with alanine, histidine, and valine; phenylalanine showed strong positive correlations with glutamic acid, isoleucine, leucine, proline, and valine; histidine showed a strong correlation only with cystine; isoleucine, leucine, and proline showed strong positive correlations with glutamic acid, phenylalanine, proline, and valine; serine showed strong positive correlations with aspartic acid, alanine, tyrosine, and threonine; tyrosine showed strong correlations with aspartic acid, alanine, arginine, histidine, threonine, and serine; threonine showed positive correlations with aspartic acid, serine, tyrosine, and alanine.
Regarding the correlations between amino acids and spectral bands in group 3, aspartic acid showed negative correlations with B2–B11 and B15–B18; glutamic acid showed positive correlations with B2–B10, B15–B16, and B7; alanine was negatively correlated with B2–B10 and B15–B17, and positively with B28; arginine showed a negative correlation with B15 and positive correlations with B23, B26, and B28; histidine showed a positive correlation with B1; isoleucine showed positive correlations with B2–B4, B6–B10, and B15–B16; methionine showed negative correlations with B1–B10, B15–B19, and B27; proline was positively correlated with B2, B4, B6–B8, and B10; serine showed negative correlations with B2–B3, B5–B11, B15–B18, and B27, and positive correlations with B22–B23; tyrosine was negatively correlated with B2, B4, B7–B10, and B15–B16, and positively with B21–B23, B26, and B28; valine showed a positive correlation with B2.
In group 4, strong positive correlations were observed with alanine and aspartic acid; glutamic acid with isoleucine, leucine, proline, threonine, and valine; and alanine with valine, tyrosine, and serine. Arginine showed strong negative correlations with isoleucine, leucine, tyrosine, and proline, and a strong positive correlation with methionine. Glutamic acid and cystine both exhibited strong positive correlations with isoleucine, leucine, threonine, and valine. Phenylalanine displayed a strong positive correlation only with leucine. Glycine showed a strong negative correlation with serine. Histidine showed strong positive correlations with tyrosine and threonine. Isoleucine showed strong positive correlations with glutamic acid, cystine, leucine, proline, threonine, and valine. Leucine showed strong positive correlations with glutamic acid, cystine, phenylalanine, isoleucine, proline, threonine, and valine, and a negative correlation with arginine. Methionine showed a strong positive correlation with arginine and a negative correlation with tyrosine. Proline demonstrated strong positive correlations with glutamic acid, isoleucine, and leucine, and a negative correlation with arginine. Serine showed strong correlations with aspartic acid and alanine, and a negative correlation with glycine. Tyrosine showed strong positive correlations with alanine and histidine, and a negative correlation with arginine. Threonine showed strong correlations with glutamic acid, isoleucine, leucine, and valine. Valine showed strong correlations with glutamic acid, alanine, cystine, isoleucine, and threonine.
Regarding the correlations between bands and amino acids, glutamic acid showed correlations with bands B2, B5, B7–B10, B14–B17, and B22–B28. Arginine showed positive correlations above 0.20 with B14–B15 and B27, and a negative correlation with B26. Cystine showed negative correlations with B2, B8–B10, B15–B17, and B27, and positive correlations with B21, B23, B26, and B28. Glycine showed a correlation of −0.33 with B27. Isoleucine and leucine showed negative correlations with B2, B8–B10, and B15–B17, and positive correlations with bands B22–B28. Lysine showed correlations with B23–B25. Methionine showed positive correlations with B12–B14 and B21–B23. Proline showed negative correlations with B2, B3–B4, B7–B10, B15–B17, and B27, and positive correlations with B23–B26 and B28. Tyrosine showed a correlation of −0.30 with B27. Threonine showed negative correlations with B2, B3–B4, B7–B10, B15–B16, and B27, and positive correlations with B12, B13, B21–B26, and B28. Valine showed negative correlations with bands B8–B10, B15–B16, and B27, and positive correlations with B21–B26 and B28.
The groups formed by the different materials analyzed exhibited distinct profiles in both amino acid composition and spectral behavior. In general, materials in group 1 stood out for having the highest contents of most amino acids, including aspartic acid, glutamic acid, alanine, cystine, phenylalanine, isoleucine, leucine, lysine, proline, tyrosine, threonine, and valine, indicating superiority in terms of amino acid composition. Group 2 was characterized by the lowest concentrations of most amino acids, except for elevated levels of arginine, glycine, and methionine. Group 3 presented intermediate levels for most amino acids, except for high contents of lysine and tyrosine. Group 4 was distinguished by higher levels of lysine, methionine, tyrosine, and threonine, with intermediate values for isoleucine, leucine, proline, and valine.
Regarding spectral behavior, C4 exhibited the highest reflectance across all spectral bands (B1–B28), followed by C1. C3 showed the lowest reflectance in most bands, with a similar pattern to C2 in bands B7–B10 and to C1 in bands B21–B26 and B28. C1 and C2 displayed similar behavior in bands B12 and B13.
In terms of correlations, the bands that showed the strongest associations with amino acids, considering both positive and negative correlations, were as follows.
Positive correlations of highlight: i. B21 to B28: These bands were positively correlated with amino acids such as glutamic acid, arginine, cystine, isoleucine, leucine, methionine, proline, threonine, and valine; ii. B12 to B17: These bands were positively associated with glutamic acid, histidine, methionine, proline, and threonine; iii. B23 and B26: These bands consistently showed positive correlations with arginine, cystine, proline, tyrosine, and valine; Negative correlations: i. B2 to B10: These bands were most frequently associated with negative correlations with aspartic acid, alanine, arginine, methionine, proline, serine, tyrosine, and valine; ii. B15 to B18: These bands showed significant negative correlations with aspartic acid, alanine, cystine, methionine, proline, and serine; iii. B27: This band was notable for negative correlations with glycine, histidine, lysine, methionine, proline, tyrosine, and valine.
4. Discussion
Ref. [
15] reported that, among the most important priorities in soybean production and research is the sustainable provision of soybean protein, which can be achieved through the breeding of high-yielding and high-protein varieties using the crop’s genetic resources. Thus, grouping materials based on their amino acid (AA) content enabled the formation of distinct groups, with particular emphasis on the genetic materials in group 1, which in this study proved to be the most promising for supplying amino acids. Grouping genotypes by AA composition allows for the identification of genetic materials with desirable traits for different purposes, such as grain production or forage use, thereby guiding breeding programs toward the selection of targeted materials. It is important to note that all genetic materials were managed identically throughout the entire crop cycle, from fertilization to pest and disease control. This ensures that the materials in group 1 are indeed superior to the others in terms of AA content in their leaves. The other groups were notable for their different amino acids, allowing for the selection of materials according to the specific components of interest to be improved.
Among the amino acids present at higher levels in the materials of group 1 is proline, an important amino acid associated with drought stress tolerance in plants, functioning as an osmoprotectant and playing a key role as an antioxidant [
16,
17]. Thus, the amino acid content—especially for proline—may reflect the metabolic efficiency, which can be related to productive performance and resistance to stress, particularly abiotic stress.
There is limited information in the literature regarding the application of hyperspectral data to biochemical parameters in plant leaves, especially concerning the relationship between hyperspectral data and amino acid contents in soybean leaves. In our study, a significant relationship was observed between amino acids and spectral bands calculated from hyperspectral data. Amino acid levels reflect the plant’s metabolic health and its interaction with environmental factors, potentially indicating stress conditions, while spectral data capture morphophysiological variations, such as changes in pigment composition. Integrating these datasets provides a more comprehensive understanding of plant physiology.
Overall, bands B21–B28 exhibited the highest numbers of relevant positive correlations with different amino acids, whereas bands B2–B10 and B15–B18 showed more negative correlations. This suggests that these spectral regions have greater potential for distinguishing specific amino acid concentrations in the analyzed samples. Bands B2–B10 cover the 370–500 nm range, B15–B18 span 650–684 nm, and B21–B28 encompass specific SWIR regions from 701 to 730 nm (B21–B23), with the remaining bands corresponding specifically to 960, 1100, 1400, 1930, and 2200 nm. Ref. [
4] reported that the bands most strongly associated with the majority of amino acids in maize leaves were mainly concentrated in the ranges of 505.39–604.95 nm and 651.21–714.10 nm, which is likely due to the influence of various pigments, especially the chlorophyll content. The authors also note that there are relatively few studies on the non-destructive detection of amino acid contents in leaves using spectral spectroscopy, and that further research is needed to demonstrate the feasibility of the non-destructive detection of amino acids in leaves.
In addition to the visible region, the SWIR region also contributes to establishing correlations with different amino acids. Ref. [
18] found that the characteristic wavelengths of amino acid nitrogen were primarily distributed in the long-wave near-infrared region. The use of sensors offers potential opportunities not only at the laboratory level, but also through portable imaging systems and satellite-based platforms, enabling the application of this technology in the field. This approach allows for the assessment of plant performance at the tissue level, in individual plants, or across entire crops, utilizing high spectral resolution systems capable of evaluating biochemical and physiological traits in diverse plant populations [
19]. Based on this spectral information, the relationship with the amino acid content can help identify genotypes with greater metabolic efficiency, nutritional quality, or stress resilience, thereby accelerating plant breeding programs.
The literature reports studies demonstrating that proteins exhibit absorption peaks in the ranges of 1460–1570 nm and 2000–2180 nm [
20,
21]. This justifies the negative correlations observed between amino acids and the bands within the SWIR region, as higher amino acid contents result in lower reflectance. Thus, the use of these spectral ranges enables the non-destructive detection of proteins, which should primarily be conducted using the shortwave infrared (SWIR) system with specific spectral bands [
22], as previously mentioned. By relating spectral characteristics to functional groups such as C-H, N-H, and O-H, it is possible to indirectly detect the water content, total nitrogen, free amino acids, caffeine, and theanine [
23,
24,
25]. The spectral characteristics of these functional groups are associated with ranges from the visible to the near-infrared spectrum, indicating that combining visible and near-infrared spectra provides useful information and can improve the accuracy of detecting specific compounds, such as amino acids in samples [
8].
Our results reveal that soybean genotypes can be effectively grouped based on their amino acid profiles and spectral reflectance patterns, demonstrating a strong association between biochemical traits and leaf spectral responses. This ability to discriminate genotypes based on amino acid composition opens up relevant options for breeding programs, as it allows the selection of lines with superior nutritional profiles, especially those with higher concentrations of essential amino acids, which are essential for high-quality feed and feed formulations.
The use of hyperspectral data makes the process of reliably correlating leaf amino acid contents a significant methodological advancement. This approach allows for the inference of nutritional traits from spectral data, reducing the reliance on destructive, time-consuming, and costly laboratory analyses. This makes it possible to evaluate large genetic populations quickly, non-destructively, and with high efficiency, optimizing selection processes in genetic breeding programs.
In addition to the operational gains, the data from this study reinforce the predictive potential of hyperspectral sensing applied to plant physiology, with an emphasis on the protein nutrition of soybean crops, paving the way for future research, focusing on the relationship between the spectrum and the AA contents of leaves. The adoption of this technology can significantly accelerate the identification of promising genotypes, simultaneously integrating nutritional and productive attributes.
This work, therefore, highlights the transformative role of hyperspectral sensing in genetic improvement and the soybean production chain. By enabling the early and accurate selection of genotypes with desirable amino acid profiles, the technology contributes not only to the development of cultivars with higher nutritional value but also to the sustainable intensification of agricultural production.