Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy

Leal, Marisa Menezes; Bilhalva, Nairiane dos Santos; Moraes, Rosana Santos de; Coradi, Paulo Carteri

doi:10.3390/agriengineering7060194

Open AccessArticle

Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy

by

Marisa Menezes Leal

,

Nairiane dos Santos Bilhalva

,

Rosana Santos de Moraes

and

Paulo Carteri Coradi

^*

Laboratory of Postharvest (LAPOS), Cachoeira do Sul Campus, Federal University of Santa Maria, Highway Taufik Germano, 3013, Passo D’Areia, Cachoeira do Sul, Santa Maria 96506-322, Rio Grande do Sul, Brazil

^*

Author to whom correspondence should be addressed.

AgriEngineering 2025, 7(6), 194; https://doi.org/10.3390/agriengineering7060194

Submission received: 18 May 2025 / Revised: 4 June 2025 / Accepted: 11 June 2025 / Published: 17 June 2025

(This article belongs to the Special Issue Latest Research on Post-Harvest Technology to Reduce Food Loss)

Download

Browse Figures

Versions Notes

Abstract

The study aimed to determine the physical and physicochemical properties of soybean grains using NIR spectroscopy coupled with multivariate data analysis. The experiment was carried out in two stages: first, individual characterization of defects and healthy grains; then, analyses of samples classified into different types (type I, type II, basic standard, and out of type). The centesimal composition of the grains (crude protein, lipids, water content, crude fiber, starch, and ash) was determined by NIR spectroscopy, and the data were analyzed by ANOVA, Scott-Knott test, principal component analysis (PCA), k-means clustering, and Pearson correlation. The results showed significant variations between defects and commercial types in all the variables evaluated (p < 0.05), with an emphasis on germinated grains (higher protein content) and broken grains (higher fiber content). The PCA explained 66.6% of the total variance in the defect sets and 52.2% of the types, allowing the formation of groups defined by the clustering algorithms. Pearson correlations indicated important interactions between the chemical variables, such as the negative correlation between protein and crude fiber (r = −0.73) and between lipids and water content (r = −0.66). It is concluded that the NIR method combined with multivariate modeling allows for the rapid assessment of soybean grain quality in real time, optimizing, reducing waste in, and increasing the efficiency of post-harvest processes.

Keywords:

Glycine max L. Merrill; near-infrared spectroscopy (NIR); multivariate analysis; physicochemical properties

1. Introduction

The quality of soybeans in the post-harvest market is determined by physical classification. Accordingly, soyabean grains can be divided into group I (suitable for consumption in nature), which is categorized into type I and type II according to quality levels, and group II (intended for other purposes), categorized into standard basic and out-of-type samples (discarded or relocated). The quality of each category (type I, type II, and standard basic) is defined based on the percentage of tolerance established by the technical regulations for soybeans [1,2]. Defects in soybean grains can occur because of factors such as inadequate crop management, pest attacks, adverse weather conditions, and failures in post-harvest processes such as drying at high temperatures.

According to the technical regulations for soybeans [1,2], defects in soybeans are grouped into serious and minor defects. Serious defects include burned, sour, and musty beans, while minor defects include crushed, damaged, fermented, germinated, immature, greenish, shriveled, split, and broken beans. In general, these defects devalue a product, impact its legal classification, and result in discounts on purchase and sales contracts.

Soy is one of the main agricultural products worldwide and is highlighted as a source of protein for humans and other animals [3]. The soybean grain is composed, on average, of 40% protein, 20% lipids, 5% fiber, and approximately 35% carbohydrates [4,5,6]. Owing to their diverse macronutrient composition, soybeans are important ingredients in the production of several commodities, including biodiesel [7], plant-based meat [8], animal feed [9], and other derivatives.

The physical appearance and chemical composition of soybeans are determining factors of their quality [10]. With advancements in the economy and science, it is essential to invest in technologies that improve the determination of grain compositions. Traditional analysis methods, although widely used, have disadvantages such as high cost of specialized labor, long analysis time, and risks of environmental contamination due to the use of chemical reagents [10].

Near-infrared (NIR) spectroscopy is an efficient alternative for assessing the quality of soybean grains. This technology enables the quick determination of a sample’s chemical composition through the absorption and vibration of organic matter in the infrared region [11]. However, its implementation is hindered by the high initial cost of equipment and the need for extensive calibration [12]. Shi et al. [13] developed models for NIR calibration to predict the gross protein and amino acid contents in soybean. In addition, researchers have explored the impact of particle size, fat content, and type of NIR spectroscopy on the predictive capacity of the models. The results demonstrate that the accuracy of the predictions increased when the grains were milled and the lipids were removed; however, the high cost of this sample preparation procedure should be considered. Ferreira [14] predicted that techniques such as mid-infrared spectroscopy and NIR spectroscopy could be applied as analytical tools in industries to monitor soybean quality and simultaneously measure several quality parameters. NIR spectroscopy reveals chemical composition and spectral data, allowing the development of multivariate calibration models.

Multivariate analysis allows multiple variables to be examined simultaneously. Among the most used multivariate analysis methods, principal component analysis (PCA), which reduces the dimensionality of data by predicting more relevant information, is becoming an essential data interpretation tool [15]. This technique identifies variables or combinations of variables that produce underlying patterns in the data, allowing them to be analyzed and interpreted in chemical and physicochemical terms [16].

Another widely used multivariate analysis method is clustering, which classifies homogeneous data based on their similarities. This method is used to classify data into cluster types and to measure similarities across different sources [17]. Pearson correlation is a method that assesses the degree of linear relationship between two continuous variables, represented by r, which ranges from −1 to 1. A value of −1 indicates a perfect negative correlation; 1 indicates a perfect positive correlation; and 0 indicates no linear correlation between the variables.

The physicochemical characterization of soy grains is essential for evaluating their quality for human and animal consumption and for biofuel production. Given the growing global demand for grains, it is essential to improve the analytical methods that guarantee quality and efficiency in production. Therefore, the present study aimed to determine the physical and physicochemical properties of soybean grains through NIR spectroscopy and multivariate analysis.

2. Materials and Methods

This study was carried out at the Post-Harvest Laboratory of the Federal University of Santa Maria (Cachoeira do Sul Campus) in conjunction with the Physical–Chemical Analysis Laboratory located at the Food Research Center of the University of Passo Fundo.

2.1. Obtaining Samples

Soybean samples from the harvest of 2022/2023 were obtained after grain processing (without storage) at the Tritec Cereais storage units in Bianchini, both located in the municipality of the Waterfall of South, in the Center of Rio Grande of South, in the physiographic Depression Central region. The city is located beside the River Jacuí, with the geographic coordinates latitude 30°0′45″ S, longitude 52°55′11″ W, and an altitude of 73 m. Cachoeira do Sul is the largest city located in the Jacuí Valley, covering an area of 3735.164 km², and is the ninth largest municipality in the state.

2.2. Sample Preparation

The samples were separated according to Classification Normative Instructions No. 11 [1] and No. 37 [2], which govern the technical regulation of soybeans, where group I is intended for consumption in natura and group II for other purposes. Type I, type II, and basic standard are defined for the purpose of classifying the product’s quality.

The samples for product typing were composed with the maximum number of defects permitted by IN MAPA regulation No. 11/2007, in which type I grains may have a maximum of 4% of damaged grains (burned, moldy, immature, shriveled, fermented, and germinated), type II grains 6%, basic standard 8%, and off-type 12% or more. The samples were also composed of greenish, broken, and crushed grains: 2% (type I), 4% (type II), and 8% (basic standard) of green grains and 8% (type I), 15% (type II), and 30% (basic standard) of broken and crushed grains were added. The samples were quantified in grams (Table 1 and Table 2) for a better interpretation of the quantity of defects used in the samples (type I, type II, basic standard, and out of type).

2.3. Defects and Healthy Grains

To characterize individual samples of defective and healthy grains, 100 g of each category was separated. The defects in the soybean grains were as follows: burned, sour, musty, fermented, germinated, damaged, immature, and shriveled.

2.4. Group I, Group II, and Out-of-Type Samples

The samples in group I belonged to types I and II, whereas those in group II corresponded to the basic standard type. The out-of-type samples did not fit the established groups. These classifications were determined based on the maximum limits of tolerated defects according to the classification standard IN MAPA No. 11/2007 [1,2]. Next, 2 kg samples were prepared for group I, group II, and out-of-type soybeans (Table 1 and Table 2).

After selecting the number of defects to compose each category, the grains were homogenized using a quartering machine according to the maximum defect tolerances allowed. Subsequently, the 2 kg samples were divided into 100 subsamples of 20 g in airtight plastic bags, as shown in Figure 1.

The type I and type II, basic standard, and out-of-type samples were homogenized, packaged, and identified according to their category. Subsequently, they were subjected to an analysis of their centesimal composition by means of near-infrared spectroscopy (NIR). After the analysis, the samples were packaged and stored according to their identification for future studies.

2.5. Proximate Composition

The percentages of crude protein (CP), water content (WC), lipids (LIPs), crude fiber (CF), ash (ASH), and starch (ST) in individual defects and healthy grains from the type I, type II, standard basic type, and out-of-type samples were determined using NIR spectroscopy (FOSS, DS2500 spectrometer, Hillerød, Denmark). This method is based on the interaction between light and the sample, where the sample is illuminated by radiation of a specific wavelength in the NIR region. The samples are placed in a sampling capsule in which the light interacts with the sample. Then, the amount of reflected energy emitted by the spectrometer is reflected by the sample to the detector.

This technique allows for the rapid and accurate analysis of multiple grain constituents (CP, WC, LIPs, CF, ASH, ST). For better accuracy, the analyses were performed in triplicate and conducted in reflectance mode. Spectral data were recorded in the spectral range of 400 to 2500 nm. After collecting the spectral data of the centesimal composition of the soybean grain, an analysis of variance ANOVA (Sisvar version 5.8 software) was performed using the F test to verify the existence of significant differences between the experimental categories (defects and soybean type classifications). The grouping of the means was performed using the Scott-Knott test at 5% probability [18].

2.6. Statistical Analysis

All the statistical analyses were performed using a completely randomized design (CRD) experiment. To evaluate the defects, 11 experimental categories were employed (germinated, shriveled, burned, crushed, fermented, immature, healthy, broken, damaged, moldy, burned), each with three replicates. The obtained data were statistically analyzed using RStudio (Software version 1.0). Four categories were evaluated to classify the type I, type II, basic standard, and out-of-type samples, with 100 repetitions in triplicate, resulting in a total of 1200 analyses. The data were evaluated using the RStudio program, and the averages were compared using the Scott-Knott test with 5% probability. In addition, PCA and Pearson correlation were performed using the Rbio software (Software version 2.0), allowing for a more detailed evaluation of the relationships between the variables analyzed.

2.6.1. Principal Component Analysis

Principal component analysis (PCA) was used as a tool to visualize the data structure, while the k-means algorithm was applied to identify patterns of similarity between the samples. Then, PCA was applied; the first two principal components (PC1 and PC2) were used to construct biplots. PC1 refers to the principal component of greatest importance, as it explains the greatest variability in the data, followed by PC2. The k-means algorithm was then applied to the standardized data, and the obtained clusters were visualized in the biplots generated from the PCA. The number of clusters was defined empirically, based on the interpretation of the data and the explained variability.

In Table 3, each codification (1 to 11) represents a sample composed exclusively of grains with a specific type of defect. The samples related to the defects were used to analyze the separation between the different types and to identify the natural grouping based on their multivariate characteristics. In Table 4, each sample was coded (1 to 400) and classified into four categories (type I, type II, basic pattern, and out of type). One hundred samples of each type were analyzed, totaling 400 observations. These data were used to verify whether the types presented a defined separation in the principal components and to validate the groupings obtained by k-means. All the analyses were conducted in R software with the help of the packages “ggfortify”, “factoextra”, and “FactoMineR” [19,20].

2.6.2. Pearson Correlation

Pearson correlation was carried out in two stages: the correlation matrix was first created between the variables; then, a correlation network was created to facilitate the visualization of the interrelationships. The Pearson correlation matrix can assume node interval values between −1 and 1, indicating the degree of correlation and the direction of linear relationships. A value closer to 1 indicates a strong correlation between the variables, and a value closer to 0 indicates a weaker correlation [21].

The correlation network visibly defines the intensity of the correlations between the variables analyzed. Thick red lines indicate a strong negative correlation, whereas thick green lines indicate a positive correlation. The closer the value is to the ends, the thicker the lines become. As the values move away from the ends, the lines become thinner, demonstrating a weak correlation. The analyses were performed using the Rbio software [22].

3. Results and Discussion

The results and discussions were obtained through the physical–chemical analysis of soybean grains, considering both the different defects and the commercial classifications (type I, type II, basic standard, and out of type). The variables analyzed were the contents of water, crude protein, lipids, starch, fiber, and ash, estimated by near-infrared spectroscopy (NIR). Then, statistical analyses such as ANOVA, Scott-Knott test, principal component analysis, k-means clustering, and Pearson correlation were employed for data interpretation.

Table 5 presents the characterization results of the analyses conducted in this study. The analysis of variance (ANOVA) provided data on the variations in the physicochemical properties of soybean grains as a function of defects. Statistically significant differences (p < 0.05) were observed for all the variables analyzed.

The coefficient of variation (CV) expresses the dispersion of data relative to the mean. A low CV indicates that the analyzed data were less dispersed around the mean, whereas a high CV indicates greater dispersion between values. In the analyzed data regarding grain defects, WC had an overall mean of 10.64% with a CV of 0.77%, demonstrating low dispersion. The overall mean for CP was 37.58%, and its CV was 0.39%, which also demonstrates low dispersion. The same was observed for LIPs, which had a 0.32% dispersion relative to the general average of 16.20%. The CF content showed low dispersion, with a CV of 2.44% and an overall average of 5.48%, similar to ASH, which presented a CV of 2.45% and an average of 6.14%. In contrast, ST demonstrated high dispersal, with a CV of 13.33% compared to the general average of 0.65%.

The mean test analyses performed on grains with defects showed a variance in the results between the defects and the physicochemical properties of soybean grains. However, despite the variation in the results, the physicochemical properties in relation to the defects maintained a similar behavior for each variable studied.

The different physical defects in soybean grains significantly influenced the centesimal composition. Germinated grains stood out with the highest crude protein content (41.40%), possibly due to the mobilization of reserves during germination, which increases enzymatic activity and protein solubility [23]. In contrast, broken grains presented the lowest protein concentration (33.17%) and highest fiber content (13.64%), which can be explained by the exposure of the tegument to air and oxidation reactions [24]. The water content also varied, with damaged and crushed grains showing values above 12%, probably because they have a greater capacity to absorb moisture [25]. Broken grains, on the other hand, had the lowest water content (8.44%), due to the greater exposure of the endosperm to air, facilitating water evaporation and grain dehydration [26]. These variations directly impact stability during storage and the final nutritional quality of the grain.

To better understand the characteristics of soybean defects, a similarity-based analysis was performed using PCA and k-means clustering. During k-means clustering (Figure 2), the defects were grouped into three subgroups. In addition, the program renamed the analyzed data by assigning numbers to each soybean defect, where 1 referred to germinated grains, 2 to shriveled grains, 3 to sour grains, 4 to crushed grains, 5 to fermented grains, 6 to immature grains, 7 to healthy grains, 8 to broken grains, 9 to damaged grains, 10 to musty grains, and 11 to burned grains.

In group I, the defects (shriveled, burned, crushed, immature, damaged) stood out, with healthy grains having a lower impact. Group II comprised only broken grains, and in group III, germinated, fermented, moldy, and burned grains remained, with a higher ash content and protein alterations. Thus, it can be seen that the PCA transformed the original variations in the soybean grains into new axes, reducing the complexity. The first two PCs explain 66.6% of the total variation in the centesimal composition of samples with soybean defects, with PC1 and PC2 accounting for 36.5% and 30.1%, respectively. According to Oliveira et al. [27], the sum of the first two PCAs should exceed 70%. However, the value found was sufficient to separate the defects in the soy grains for the next analyses.

In the Pearson correlation matrix (Table 6), the interactions between the variables studied were numerically demonstrated, where each row of the matrix represents a sample of the 11 defects with the centesimal composition measured through NIR spectroscopy. The strong negative correlation between crude protein and crude fiber (r = −0.735) indicates that the increase in fiber is associated with a reduction in protein, a common effect in grains with structural damage such as cracked or moldy grains. The negative correlation between water content and lipids (r = −0.665) indicates that wetter grains tend to contain less fat, possibly due to moisture-induced lipid degradation processes. The values range from −1 to 1, indicating the intensity of the correlation between the variables. Negative correlations are shown in red, and positive correlations are shown in green (Figure 3).

The variables analyzed for soybean grain defects were correlated (Figure 3), which is consistent with the results shown in Table 6. The LIPs and WC were strongly and negatively correlated with each other, as were CP and CF. The WC showed a weak negative correlation with ST, CF, and ASH. CP was negatively correlated with LIPs and positively correlated with WC, ST, and ASH. CF showed a weak negative correlation with ST and a positive correlation with LIPs and ASH.

Statistical analyses indicate that gross protein reduction was accompanied by an increase in CF, as evidenced by the pattern in the Pearson correlation coefficient network (Figure 3), which demonstrates the inverse relationship between the variables. Ziegler et al. [23] studied the effects of water content (12% and 15%) and storage temperature (11, 18, 25, and 35 °C) on the functional properties of soy protein for 12 months. They reported that a water content of 12% combined with storage at 25 °C and 32 °C reduced the protein solubility by 14.1% and 20.6%, respectively.

Cañizares et al. [26] evaluated the effect of different percentages (0, 15%, 30%, and 35%) of broken grains in soybean samples stored for 12 months and concluded that the presence of 15% broken grains did not affect the quality of the sample. However, when the percentage of broken grains was 30%, the quality of the sample was negatively affected, with an increase of up to 75% in moldy grains.

NIR spectroscopy of the grain samples was performed to assess their composition and physicochemical properties. This analysis was carried out using grain integers, making it possible to quantify the CP, WC, CF, ST, LIPs, and ASH present in various types of soybean grains.

Table 7 presents the results of the analysis performed in this study. An analysis of variance provided data on the variations in the physicochemical properties of soy grains as a function of classification type. For all the parameters analyzed for classifying the soybean grains, the probability of error was lower in relation to the calculated F, indicating significant differences between the groups analyzed. Thus, NIR spectroscopy showed that the defects in the soybean grains had a considerable influence on their physicochemical composition.

The coefficients of variation presented similar values between 1.19% and 5.71%. The general averages of the variables studied were 35.39% for CP, 16.79% for LIPs, 11.32% for WC, 5.24% for CF, 4.93% for ASH, and 1.07% for ST.

When a soy grain sample containing the maximum allowed defects was analyzed, a significant variation was observed in its physicochemical properties based on the type of soy grain (Table 7). The WC ranged between 11.08% and 11.54%, with type I and outside-of-type samples demonstrating the best values. Basic standard samples showed the highest CP content of 35.88%. Additionally, basic standard and out-of-type samples demonstrated high LIP contents of 16.89% and 17.02%, respectively. Type II showed better results for ST and CF contents. The ASH content varied between 4.91% and 4.95%, with higher values observed in type I, type II, and out-of-type samples.

The samples were classified according to their similarities using PCA and k-means clustering. To facilitate the understanding of the data, k-means clusters (Figure 4) were divided into four large groups: type I (1–100), type II (101–200), basic standard (201–300), and out of type (301–400). Group I contained 130 samples, group II contained 61 samples, group 3 contained 65 samples, and group 4 contained 144 samples, totaling 400 soybean grain samples. The first two PCs accounted for 52.2% of the total variation in the centesimal composition of the samples, with PC1 accounting for 29.5% and PC2 accounting for 22.7%.

The Pearson correlation matrix shown in Table 8 was constructed based on 400 observations corresponding to the samples of soybean grains classified between types I, II, basic standard, and out of type. The columns of the matrix used six physical–chemical variables (ASH, WC, ST, CP, CF, LIPs). The correlation between the studied variables, referring to the types of classification of the soybean grain, is numerically demonstrated in the table.

The variables analyzed for the types of soybean samples were correlated (Figure 5), consistent with the results shown in Table 8. CP showed a strong negative correlation with ASH (r = −0.9953), indicating that ash accumulation is indicative of protein degradation in lower-quality types [23]; weak negative correlations with WC, CF, and ST; and a weak positive correlation with LIPs. LIPs demonstrated a strong negative correlation with CF (r = −0.931) and ST (r = −0.955) indicating that grains with a higher energy content (lipids) tend to have fewer structural and digestive carbohydrates [6,28]. In addition, LIPs demonstrated a weak negative correlation with ASH and a positive correlation with WC. ST and CF showed a strong positive correlation. ST showed weak negative and weak positive correlations with ASH and WC, respectively. CF showed a weak negative correlation with WC and a weak positive correlation with ASH.

The correlation network demonstrated a high negative correlation between CP and ASH, LIPs and ST, and LIPs and CF. In contrast, CP showed a strong positive correlation with ST. In addition, CP showed a weak negative correlation with WC, while ASH showed a weak positive correlation with WC. Ziegler et al. [28] and Coradi et al. [29] observed an increase in the ASH contents of soybeans stored with water contents of 9%, 12%, 15%, and 18% at 11, 18, 25, and 32 °C for 12 months. After storage at 32 °C, the ASH content increased from 5.1% to 6.1%. The authors attributed these changes to inadequate storage practices, because, after 12 months of storage at 32 °C, the ash content increased, and the protein and lipid contents decreased. These conditions facilitate fermentation and lead to the degradation of compounds, which also contributes to the increase in ash content.

Different types of soybean grains were compared in terms of quality, and the parameters analyzed (CP, ASH, WC, CF, ST, and LIPs) showed similarities without significant variation. The same was observed for defects.

4. Conclusions

The results showed that near-infrared spectroscopy (NIR) combined with multivariate analysis is an effective, fast, and non-destructive tool for the physical–chemical characterization and quality classification of soybean grains with different types of defects and typing categories. The results showed significant variations in the centesimal composition between individual defects, as well as between the types established by IN MAPA No. 11/2007, especially in the contents of crude protein, lipids, water, and ash. To this end, different groups categorized with different defects showed that the physical defect does not always indicate the quality level of the sample, but on the other hand, the centesimal composition is the true indicator of quality.

Principal component analysis explained 66.6% of the total variance of defects in the first two components, while PCA related to types explained 52.2%; these values are acceptable for discrimination and pattern visualization purposes. k-means clustering was effective in adequately separating the sample groups into three clusters for defects and four clusters for type categories, thus demonstrating the applicability of unsupervised models in the qualitative classification of grains. Pearson correlations showed a strong negative correlation between crude protein and crude fiber (r = −0.73); the same occurred for lipids and water content (r = −0.66), expanding the potential of multivariate monitoring for quality control.

Thus, the results obtained in this work highlight the potential of integration between NIR spectroscopy and multivariate statistical analysis as a promising strategy for the real-time monitoring of soybean grain quality, contributing to the optimization of post-harvest processes, thus minimizing waste and improving the efficiency of soybean grain analysis.

Author Contributions

R.S.d.M.: Methodology, Formal analysis, Investigation, Writing—original draft. M.M.L. and N.d.S.B.: Formal analysis, Writing—review and editing. P.C.C.: Project administration, Methodology, Formal analysis, Investigation, Writing—review and editing, and Writing—original draft. All authors have read and agreed to the published version of the manuscript.

Funding

This study was financially supported by CAPES (Coordination for the Improvement of Higher Education Personnel, Financial Code 001) to CNPq (National Council for Scientific and Technological Development), number 304966/2023-1, and FAPERGS-RS (Research Support Foundation of the State of Rio Grande do Sul), number 24/2551-0001150-1.

Data Availability Statement

All the data has been made available. The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors would like to thank the Federal University of Santa Maria (UFSM), and the Post-Harvest Laboratory (LAPOS) Research Group on Post-Harvest Innovation: Technology, Quality, and Sustainability. We would also like to thank the Physical–Chemical Analysis Laboratory of the Food Research Center of the Federal University of Passo Fundo (UPF) for their contribution to the experiments.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ministry of Agriculture, Livestock and Food Supply. Normative Instruction No. 11 of May 15; Soybean Technical Regulation; Ministry of Agriculture, Livestock and Food Supply: Brasília, Brazil, 2007.
Ministry of Agriculture, Livestock and Food Supply. Normative Instruction No. 37 of July 27; Soybean Technical Regulation; Ministry of Agriculture, Livestock and Food Supply: Brasília, Brazil, 2007.
Mountain, G.S.; Perez, L.C.; Brandão, J.R.; De Camargo, R.F.; Tavares, T.R.; De Almeida, E.; De Carvalho, H.W.P. Profile of mineral nutrients and proteins in soybean seeds (Glycine max (L.) Merrill): Insights from 95 varieties cultivated in Brazil. J. Food Compos. Anal. 2024, 134, 106536. [Google Scholar] [CrossRef]
Karr-Lilienthal, L.K.; Kadzere, C.T.; Grieshop, C.M.; Fahey, G.C., Jr. Chemical and nutritional properties of soybean carbohydrates as related to nonruminants: A review. Livest. Prod. Sci. 2025, 97, 1–12. [Google Scholar] [CrossRef]
Poysa, V.; Woodrow, L.; Yu, K. Effect of soy protein subunit composition on tofu quality. Food Res. Int. 2006, 39, 309–317. [Google Scholar] [CrossRef]
Liu, K. Soybeans: Chemistry, Technology, and Utilization; Springer: Berlin/Heidelberg, Germany, 2012. [Google Scholar]
Mora, J.M.R.; Lacson, C.F.Z.; Choi, A.E.S.; Chung, T.W.; Retumban, J.D.; Abarca, R.R.M.; De Luna, M.D.G. Biodiesel production from soybean oil via LiOH-pumice catalytic transesterification and BBD-RSM optimization. Energy Rep. 2024, 11, 4032–4043. [Google Scholar] [CrossRef]
Wang, Y.; Lyu, B.; Fu, H.; Li, J.; Ji, L.; Gong, H.; Yu, H. The development process of plant-based meat alternatives: Raw material formulations and processing strategies. Food Res. Int. 2023, 167, 112689. [Google Scholar] [CrossRef]
Mukherjee, R.; Chakraborty, R.; Dutta, A. Role of fermentation in improving nutritional quality of soybean meals—A review. Asian-Australas. J. Anim. Sci. 2016, 29, 1523–1529. [Google Scholar] [CrossRef]
Wei, X.; Li, S.; Zhu, S.; Zheng, W.; Zhou, S.; Wu, W.; Xie, Z. Quantitative analysis of soybean protein content by terahertz spectroscopy and chemometrics. Chemom. Intell. Lab. Syst. 2021, 208, 104199. [Google Scholar] [CrossRef]
Nicolai, B.M.; Beullens, K.; Bobelyn, E.; Peirs, A.; Saeys, W.; Theron, K.I.; Lammertyn, J. Nondestructive measurement of fruit and vegetable quality by means of NIR spectroscopy: A review. Postharvest Biol. Technol. 2007, 46, 99–118. [Google Scholar] [CrossRef]
Silva, C.S.; Santana, D.C.; Baio, F.H.R.; Seron, A.C.D.S.C.; Alvarez, R.D.C.F.; Teodoro, L.P.R.; Teodoro, P.E. High-Precision Phenotyping in Soybeans: Applying Multispectral Variables Acquired at Different Phenological Stages. AgriEngineering 2025, 7, 47. [Google Scholar] [CrossRef]
Shi, D.; Hang, J.; Neufeld, J.; Zhao, S.; House, J.D. Estimation of crude protein and amino acid contents in whole, ground and defatted ground soybeans by different types of near-infrareds (NIR) reflectance spectroscopy. J. Food Compos. Anal. 2022, 111, 104601. [Google Scholar] [CrossRef]
Ferreira, D.S.; Pallone, J.A.L.; Poppi, R.J. Fourier transform near-infrared spectroscopy (FT-NIRS) application to estimate Brazilian soybean [Glycine max (L.) Merril] composition. Food Res. Int. 2013, 51, 53–58. [Google Scholar] [CrossRef]
Lima, R.E.; Coradi, P.C.; Rodrigues, D.M.; Teodoro, P.E.; Teodoro, L.P.R.; de Oliveira, D.P. Monitoring and predicting the quality of soybeans for different drying and storage technologies on a real scale using sensors and Machine Learning models. J. Stored Prod. Res. 2024, 108, 102386. [Google Scholar] [CrossRef]
Shi, H.; Yu, P. Comparison of grating-based near-infrared (NIR) and Fourier transform mid-infrared (ATR-FT/MIR) spectroscopy based on spectral preprocessing and wavelength selection for the determination of crude protein and moisture content in wheat. Food Control 2017, 82, 57–65. [Google Scholar] [CrossRef]
Kwon, Y.K.; Ahn, M.S.; Park, J.S.; Liu, J.R.; In, D.S.; Min, B.W.; Kim, S.W. Discrimination of cultivation ages and cultivars of ginseng leaves using Fourier transform infrared spectroscopy combined with multivariate analysis. J. Ginseng Res. 2014, 38, 52–58. [Google Scholar] [CrossRef]
Scott, A.J.; Knott, M. A cluster analysis method for grouping means in the analysis of variance. Biometrics 1974, 30, 507–512. [Google Scholar] [CrossRef]
Jollife, I.T.; Cadima, J. Principal component analysis: A review and recent developments. Philos. Trans. R. Soc. A Math. Phys. Eng. Sci. R. Soc. Lond. 2016, 374, 20150202. [Google Scholar] [CrossRef]
Tang, Y.; Horikoshi, M.; Li, W. ggfortify: Unified Interface to Visualize Statistical Results of Popular R Packages. [s.l: s.n.]. Available online: http://adv-r.had.co.nz/S3.html (accessed on 2 February 2025).
Schober, P.; Boer, C.; Schwarte, L. Correlation Coefficients: Appropriate Use and Interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
Bhering, L.L. Rbio: A Tool for Biometric and Statistical Analysis Using the R Platform. Crop Breed. Appl. Biotechnol. 2017, 17, 187–190. [Google Scholar] [CrossRef]
Ziegler, V.; Ferreira, C.D.; Hoffmann, J.F.; De Oliveira, M.; Elias, M.C. Effects of moisture and temperature during grain storage on the functional properties and isoflavone profile of soy protein concentrate. Food Chem. 2018, 242, 37–44. [Google Scholar] [CrossRef]
Cañizares, L.D.C.C.; Da Silva Timm, N.; Gaioso, C.A.; Meza, S.L.R.; Hoffmann, J.F.; Ferreira, C.D.; De Oliveira, M. Isoflavone profile identification and storage stability of different soybean genotypes sown at standard and late dates in a subtropical climate. Biocatal. Agric. Biotechnol. 2023, 51, 102739. [Google Scholar] [CrossRef]
Dubal, Í.T.P.; Coradi, P.C.; Dos Santos Bilhalva, N.; Biduski, B.; Lutz, É.; Mallmann, C.A.; Flores, E.M.M. Monitoring of carbon dioxide and equilibrium moisture content for early detection of physicochemical and morphological changes in soybeans stored in vertical silos. Food Chem. 2024, 436, 137721. [Google Scholar] [CrossRef] [PubMed]
Cañizares, L.D.C.C.; Gaioso, C.A.; Da Silva Timm, N.; Meza, S.L.R.; Ramos, A.H.; De Oliveira, M.; Elias, M.C. Influence of broken kernels content on soybean quality during storage. Grain Oil Sci. Technol. 2024, 7, 105–112. [Google Scholar] [CrossRef]
Oliveira, J.A.V.; Coradi, P.C.; Alves, C.Z.; Teodoro, P.E.; Alvarez, R.D.C.F. Correlation of physical properties for establishments of standardized groups of soybean seed technologies in post-harvest. J. Stored Prod. Res. 2021, 93, 101854. [Google Scholar] [CrossRef]
Ziegler, V.; Vanier, N.L.; Ferreira, C.D.; Paraginski, R.T.; Monks, J.L.F.; Elias, M.C. Changes in the bioactive compound content of soybean as a function of grain moisture content and temperature during long-term storage. J. Food Sci. 2016, 81, H762–H768. [Google Scholar] [CrossRef]
Coradi, P.C.; Lemes, A.F.C.; Steinhaus, J.I.; Müller, A.; Alves, C.Z. Mathematical modeling of drying in a new concept of silo-dryer-aerator and the quality of soybean seeds (Glycine max (L.) Merrill). J. Agric. Sci. Technol. B 2018, 8, 483–498. [Google Scholar] [CrossRef]

Figure 1. Characterization of soybean samples: (A) sample composed of 2 kg soybeans; (B) homogenization of the 2 kg sample; (C) weighing of 20 g of the grains; (D) packaging of the 20 g samples; (E) division of the 2 kg sample into 100 subsamples of 20 g.

Figure 2. Principal component analysis of defective and healthy grains: ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs).

Figure 3. Pearson correlation network between the centesimal composition variables of defective and healthy grains: ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs).

Figure 4. Centesimal PCA of type I, type II, basic standard, and out-of-type samples: ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs).

Figure 5. Pearson correlation network between the centesimal composition variables of type I, type II, basic standard, and out-of-type samples: ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs). Green lines represent positive correlations, red lines indicate negative correlations, and line thickness corresponds to the strength of the correlation.

Table 1. Defects in the group I soybean sample.

Group I
Type I	Grains	Grams	Damaged Max (80 g)	Type II	Grains	Grams (For 2 kg)	Damaged Max (120 g)
	Grains	(For 2 kg)	Damaged Max (80 g)		Grains	Grams (For 2 kg)	Damaged Max (120 g)
	Sour	14	80		Sour	20	120
	Burned	6			Burned	20
	Musty	5			Musty	10
	Immature	15			Immature	20
	Shriveled	15			Shriveled	20
	Fermented	15			Fermented	20
	Germinated	10			Germinated	10
	Greenish	40			Greenish	80
	Crushed and broken	160			Crushed and broken	300
	Healthy	1720			Healthy	1500

Table 2. Percentage of compound defects in the 2 kg sample of soybeans for group II and out-of-type samples.

Group II				Out of Type
Basic standard	Grains	Grams	Damaged Max (160 g)	Out of type	Grains	Grams	Damaged Max (200 g)
	Grains	(For 2 kg)	Damaged Max (160 g)		Grains	(For 2 kg)	Damaged Max (200 g)
	Sour	40	160		Sour	50.9	200
	Burned	20			Burned	23.3
	Musty	20			Musty	22.1
	Immature	20			Immature	22.4
	Shriveled	20			Shriveled	27
	Fermented	20			Fermented	27
	Germinated	20			Germinated	27.3
	Greenish	160			Greenish	200
	Crushed and broken	600			Crushed and broken	750
	Healthy	1080			Healthy	850

Table 3. Codification of soybean grain defects to define groups.

Codification	Soybean Grain Defects
1	Germinated
2	Shriveled
3	Sour
4	Crushed
5	Fermented
6	Immature
7	Healthy
8	Broken
9	Damaged
10	Musty
11	Burned

Table 4. Codification of soybean grain types to define groups.

Codification	Classification by Type
1–100	Type I
101–200	Type II
201–300	Basic standard
301–400	Out of type

Table 5. Analysis of variance and means of the centesimal composition of defects in soybean grains.

Samples	Water Content	Crude Protein (%)	Lipids	Starch	Crude Fiber	Ash
Samples	(%)	Crude Protein (%)	(%)	(%)	(%)	(%)
Crushed	12.55 b	35.35 h	14.75 i	0.21 d	5.14 d	5.09 f
Sour	10.72 d	36.39 g	18.34 b	2.06 b	4.69 e	4.93 f
Shriveled	10.23 e	38.07 e	14.12 j	0.79 c	5.96 b	6.00 d
Damaged	13.12 a	38.06 e	12.74 k	0.00 e	5.11 d	5.61 e
Fermented	9.51 f	38.81 c	15.51 f	2.59 a	5.72 c	6.06 d
Germinated	10.79 d	41.40 a	15:10 h	0.77 c	3.63 g	7.98 a
Immature	11.00 c	37.41 f	15.28 g	0.01 e	4.23 f	5.90 d
Musty	10.64 d	38.36 d	17.62 d	0.00 e	3.81 g	6.80 c
Broken	8.44 h	33.17 i	18.19 c	0.00 e	13.64 a	6.97 c
Burned	9.16 g	39.83 b	19.19 a	0.06 e	3.36 h	7.25 b
Healthy	10.87 c	36.54 g	17.41 e	0.67 c	4.96 d	4.92 f
Pr > Fc	0.000 *	0.000 *	0.000 *	0.000 *	0.000 *	0.000 *
CV (%)	0.77	0.39	0.32	13.33	2.44	2.45
General average	10.64	37.58	16.20	0.65	5.47	6.14

The means followed by the same lowercase letter in the column differ according to the Scott-Knott test, with a probability of error of 0.05. * Significant at 5% probability of error (Pr < 0.05); ns, not significant (Pr < 0.05) by the Scott-Knott test; CV, coefficient of variation.

Table 6. Pearson correlation matrix between the variables of proximate compositions of defective and healthy grains.

Analysis	ASH	WC	ST	CP	CF	LIPs
ASH	1	−0.490995	−0.254950	0.496860	0.073176	0.180163
WC	−0.490995	1	−0.205297	0.062498	−0.459225	−0.665010
ST	−0.254950	−0.205297	1	0.151098	−0.100469	0.041285
CP	0.496860	0.062498	0.151098	1	−0.735128	−0.198849
CF	0.073176	−0.459225	−0.100469	−0.735128	1	0.154646
LIPs	0.180163	−0.665010	0.041285	−0.198849	0.154646	1

Ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs).

Table 7. Analysis of variance and test of means of the centesimal composition of various soybean samples.

Samples	Water Content (%)	Crude Protein (%)	Lipids (%)	Starch (%)	Crude Fiber (%)	Ash (%)
Type I	11.54 a	35.14 c	16.75 b	1.00 b	5.23 b	4.95 a
Type II	11.16 b	35.20 c	16.51 b	1.50 to	5.31 a	4.94 a
Basic standard	11.08 c	35.88 a	16.89 a	0.97 b	5.21 b	4.91 b
Out of type	11.49 a	35.35 b	17.02 a	0.81 c	5.21 b	4.94 a
Pr > Fc	0.000 *	0.000 *	0.001 *	0.000 *	0.000 *	0.000 *
CV (%)	2.12	1.71	5.71	51.9	1.19	1.27
General average	11.32	35.39	16.79	1.07	5.24	4.93

The means followed by the same lowercase letter in the column differ according to the Scott-Knott test, with a probability of error of 0.05. * Significant at 5% probability of error (Pr < 0.05); ns, not significant (Pr < 0.05) by the Scott-Knott test; CV, coefficient of variation.

Table 8. Pearson correlation matrix representing the centesimal composition variables of type I, type II, basic standard, and out-of-type samples.

Analysis	ASH	WC	ST	CP	CF	LIPs
ASH	1	0.659713	0.284403	−0.9953	0.439386	−0.40695
WC	0.659713	1	−0.520113	−0.637201	−0.383758	0.341488
ST	0.284403	−0.520113	1	−0.324148	0.979704	−0.955333
CP	−0.9953	−0.637201	−0.324148	1	−0.465986	0.465117
CF	0.439386	−0.383758	0.979704	−0.465986	1	−0.931077
LIPs	−0.40695	0.341488	−0.955333	0.465117	−0.931077	1

Ash (ASH), water content (WC), starch (ST), crude protein (CP), crude fiber (CF), lipids (LIPs).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Leal, M.M.; Bilhalva, N.d.S.; Moraes, R.S.d.; Coradi, P.C. Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy. AgriEngineering 2025, 7, 194. https://doi.org/10.3390/agriengineering7060194

AMA Style

Leal MM, Bilhalva NdS, Moraes RSd, Coradi PC. Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy. AgriEngineering. 2025; 7(6):194. https://doi.org/10.3390/agriengineering7060194

Chicago/Turabian Style

Leal, Marisa Menezes, Nairiane dos Santos Bilhalva, Rosana Santos de Moraes, and Paulo Carteri Coradi. 2025. "Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy" AgriEngineering 7, no. 6: 194. https://doi.org/10.3390/agriengineering7060194

APA Style

Leal, M. M., Bilhalva, N. d. S., Moraes, R. S. d., & Coradi, P. C. (2025). Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy. AgriEngineering, 7(6), 194. https://doi.org/10.3390/agriengineering7060194

Article Menu

Physical Classification of Soybean Grains Based on Physicochemical Characterization Using Near-Infrared Spectroscopy

Abstract

1. Introduction

2. Materials and Methods

2.1. Obtaining Samples

2.2. Sample Preparation

2.3. Defects and Healthy Grains

2.4. Group I, Group II, and Out-of-Type Samples

2.5. Proximate Composition

2.6. Statistical Analysis

2.6.1. Principal Component Analysis

2.6.2. Pearson Correlation

3. Results and Discussion

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI