Towards Verifying the Imported Soybeans of China Using Stable Isotope and Elemental Analysis Coupled with Chemometrics

Verifying the geographical origin of soybeans (Glycine max [Linn.] Merr.) is a major challenge as there is little available information regarding non-parametric statistical origin approaches for Chinese domestic and imported soybeans. Commercially procured soybean samples from China (n = 33) and soybeans imported from Brazil (n = 90), the United States of America (n = 6), and Argentina (n = 27) were collected to characterize different producing origins using stable isotopes (δ2H, δ18O, δ15N, δ13C, and δ34S), non-metallic element content (% N, % C, and % S), and 23 mineral elements. Chemometric techniques such as principal component analysis (PCA), linear discriminant analysis (LDA), and BP–artificial neural network (BP-ANN) were applied to classify each origin profile. The feasibility of stable isotopes and elemental analysis combined with chemometrics as a discrimination tool to determine the geographical origin of soybeans was evaluated, and origin traceability models were developed. A PCA model indicated that origin discriminant separation was possible between the four soybean origins. Soybean mineral element content was found to be more indicative of origin than stable isotopes or non-metallic element contents. A comparison of two chemometric discriminant models, LDA and BP-ANN, showed both achieved an overall accuracy of 100% for testing and training sets when using a combined isotope and elemental approach. Our findings elucidate the importance of a combined approach in developing a reliable origin labeling method for domestic and imported soybeans in China.


Introduction
Cultivated soybean (Glycine max [Linn.]Merr.) evolved from wild soybean (G.soja Sieb.& Zucc.) and originated in China around 5000 years ago [1].Recognized as one of the foremost contributors to sources of oil, food, traditional Chinese medicine, and feed crops, soybeans also play a vital role in various industrial applications, particularly in biodiesel fuel production [2].Soybean contains about 18% oil and 38% protein, which constitutes 60% of the global vegetable protein provision [3], and it has become one of the most economically important oilseed and biodiesel crops.China's importance as a world soybean producer has steadily declined in the last decade despite its domestication and continued high consumption [4].According to data from the General Administration Foods 2023, 12, 4227 2 of 16 of Customs of China, China is now the world's largest importer of soybeans.In 2020, China imported a total of 100.33 million tons of soybeans, which is the highest since 2016, yet produced only 19.6 million tons of soybeans that year.The total import of soybeans into China declined to 96.52 million tons in 2021 and 91.081 million tons in 2022, which was a decrease of 5.6% compared to 2021 [5].This was mainly due to the rise of global soybean prices and the reluctance of the United States of America (USA) to export soybeans.In terms of soybean exporting countries, in 2022, Brazil replaced the USA as the top exporter, accounting for 59.73% of the export market, followed by the USA, accounting for 32.42%, and Argentina, accounting for 4.01%.The remaining 1.88% of soybean exports mainly come from countries such as Russia, Canada, Ukraine, Ethiopia, and Tanzania.Differences in growth environment, cultivar variety, and agricultural practices can all lead to varying quality parameters of soybeans.Moreover, different processing conditions such as moisture, temperature, and drying time lead to different nutritional components and quality parameters of soybeans [6].Compared with large-scale western mechanized planting, soybean cultivation costs are higher in China due to higher land cost or rental and labor charges, which lowers the cost competitiveness of Chinese soybean and increases its price in the open market.Soybeans are highly monitored crops due to China's food security policies, yet because of different soybean trade policies, the quality and price of imported soybeans vary considerably.Therefore, developing soybean origin traceability technology is of high importance for protecting China's soybean import trade and maintaining safety standards for imported soybeans.
Food fraud and adulteration have increased significantly in recent times, motivated by rapid economic gains.The consumption of labeled, branded, and certified food has become a major trend driven by changes in consumer attitudes towards nutritional health and food safety [7].In 1992, the European Union (EU) proposed that products from specific geographical origins should be granted Geographical Indications (PGI) [8].This initiative enables consumers to identify quality products, while producers use the label as an added marketing tool to increase crop value [9].In China, about 40 percent of its soybeans are cultivated in Heilongjiang Province [10], which is widely recognized as the highest-quality product and has been authorized as a PGI product in China.Origin mislabeling leads to increased profits for sellers and increases the risk of fraud for consumers.PGI protection through geographical origin verification tools is now essential for all at-risk products.Metabolomic studies aim to study differences among food products and predict class memberships using statistical models, which are primely discriminative and predictive.Therefore, obtaining food fingerprint information through appropriate technologies can ensure quality and safety and fulfill origin identification purposes [11].
Methods used to determine food geographical origins, such as stable isotopes and elemental content analysis, have been recognized as particularly useful origin discriminators as the elemental composition and stable isotope signatures (δ 2 H, δ 18 O, δ 15 N, δ 13 C, and δ 34 S) of plants reflect their growing environment, climate, and human interactions.Different photosynthetic pathways of plants and their cultivars have different biological effects on carbon dioxide respiration processes and uptake.These differences in plant tissue 13 C/ 12 C ratios are the main determinant of bulk δ 13 C signatures [12].The δ 15 N values of naturally growing plants are mainly related to plant organs, plant age, climatic conditions, biological N 2 fixation, and soil properties, and they are also susceptible to nitrogen fertilization [13].Plant δ 34 S values are influenced by soil elements (e.g., synthetic SO 4 2− fertilizers, gypsum, and elemental sulfur S o ), atmospheric deposition (e.g., rainwater and industrial pollution), and sea spray in some coastal regions [14,15].Other factors influencing δ 34 S values include soil sulfate reduction by anaerobic bacteria on a range of diffuse sources [16].Stable isotope values of hydrogen and oxygen are basically used to determine provenance, and these values are the results of isotopic fractionation produced during the meteorological water cycle that constitutes the evaporation, condensation, and precipitation of groundwater and shows systematic geographic isotopic variation [9,17].water effects, such as season, local climate, and geographical factors (e.g., latitude, altitude, and distance from the sea) [18].Previous multi-element analysis established that the mineral element compositions of organisms are directly affected by regional climatic conditions, biological and environmental interactions, and metabolism [19].Soil conditions (e.g., soil type, mineral content, pH, porosity, and humus content) are the main factors affecting the mineral content of plants [20].ICP-MS (Inductively Coupled Plasma Mass Spectrometry) has the advantages of high sensitivity and short-time detection and is suitable for the simultaneous analysis of multiple mineral elements [19].
To date, there are no studies that have generated an isotope or elemental database suitable for verifying the geographical origin of domestic and imported soybeans into China.The objective of this study was to examine the chemometric discrimination ability for the geographical origin of soybeans using stable isotopes (δ 2 H, δ 18 O, δ 15 N, δ 13 C, and δ 34 S), non-metallic element content (% N, % C, and % S) and elemental profiles.The fingerprint information acquired by Isotope Ratio Mass Spectrometry (IRMS) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS) were combined with multivariate statistical techniques such as PCA, LDA, and ANN to distinguish the geographical origin of four origins, including domestic and imported soybeans.We identified individual isotope trends and undertook multi-variate analyses to establish the most useful discriminatory chemometric models for country of origin verification.This study proposes a combined approach to trace the geographical origin of soybeans in order to solve the phenomenon of mislabeled countries of origin in Chinese markets and improve traceability efficiency in the event of fraud or food safety concerns.

Sample Collection and Preparation
All soybean samples were collected in 2020 from ports such as Shenzhen Port, Zhangjiakou Port, and others that are primarily involved in importing to China and analyzed by Shenzhen Customs, which resulted in the collection of 156 soybean samples from 4 countries: Brazil (n = 90), China (n = 33), USA (n = 6), and Argentina (n = 27).Soybean samples were dried to facilitate storage and transportation.The specific sampling information is shown in Table 1.Over 1 kg of soybeans were acquired for each sample, and samples were stored at room temperature (24 to 28 °C) in the laboratory until analysis.Soybean samples were washed with deionized water (Milli-Q water system, Darmstadt, Germany), dried again in an oven at 55 °C for 24 h to a constant weight, then ground to a fine powder (<80 μm) using a grinding mill (MM400, Retsch, German).Samples were stored in a desiccator for further stable isotopes and elemental profile analysis.
Nd is a powerful tool to discriminate the geographic location of clams.Studies have reported the application of terahertz (THz) spectroscopy [30], energy dispersive X-ray fluorescence spectrometry and near-infrared reflectance spectroscopy [31], multi-elemental analysis by energy dispersion X-ray fluorescence spectrometry combined with multivariate statistical analysis to discriminate the geographical origin of soybeans [32], and Chinese soybean paste characterization based on flavor profiles using HS-SPME-GC/MS, E-nose, and E-tongue combined with chemometrics [33].The geographical origin of commercial soybeans sold in Vietnam and Argentina was explored using ICP-MS [34,35], and stable carbon and nitrogen isotopic compositions of soybeans from various cultivated regions of China were determined [36].
To date, there are no studies that have generated an isotope or elemental database suitable for verifying the geographical origin of domestic and imported soybeans into China.The objective of this study was to examine the chemometric discrimination ability for the geographical origin of soybeans using stable isotopes (δ 2 H, δ 18 O, δ 15 N, δ 13 C, and δ 34 S), non-metallic element content (% N, % C, and % S) and elemental profiles.The fingerprint information acquired by Isotope Ratio Mass Spectrometry (IRMS) and Inductively Coupled Plasma Mass Spectrometry (ICP-MS) were combined with multivariate statistical techniques such as PCA, LDA, and ANN to distinguish the geographical origin of four origins, including domestic and imported soybeans.We identified individual isotope trends and undertook multi-variate analyses to establish the most useful discriminatory chemometric models for country of origin verification.This study proposes a combined approach to trace the geographical origin of soybeans in order to solve the phenomenon of mislabeled countries of origin in Chinese markets and improve traceability efficiency in the event of fraud or food safety concerns.

Sample Collection and Preparation
All soybean samples were collected in 2020 from ports such as Shenzhen Port, Zhangjiakou Port, and others that are primarily involved in importing to China and analyzed by Shenzhen Customs, which resulted in the collection of 156 soybean samples from 4 countries: Brazil (n = 90), China (n = 33), USA (n = 6), and Argentina (n = 27).Soybean samples were dried to facilitate storage and transportation.The specific sampling information is shown in Table 1.Over 1 kg of soybeans were acquired for each sample, and samples were stored at room temperature (24 to 28 • C) in the laboratory until analysis.Soybean samples were washed with deionized water (Milli-Q water system, Darmstadt, Germany), dried again in an oven at 55 • C for 24 h to a constant weight, then ground to a fine powder (<80 µm) using a grinding mill (MM400, Retsch, German).Samples were stored in a desiccator for further stable isotopes and elemental profile analysis.
Around 3 mg of each powdered soybean sample was weighed in duplicate into a 4 × 6 mm tin capsule for the simultaneous determination of δ 15 N, δ 13 C, δ 34 S and N, C, S content using a micro-balance (P6, Mettler, Switzerland) and tightly wrapped.The powdered samples were placed in an autosampler and converted to gases (N 2 , CO 2 , and SO 2 ) in the EA.Evolved gases were passed through a drying tube containing Mg(ClO 4 ) 2 .The EA combustion parameters were as follows: reaction oven: 1020 • C, column temperature: 70 • C, O 2 flow: 175 mL/min, oxygen injection time: 3 s; carrier flow (helium gas): 200 mL/min.Around 0.2 mg of each powdered soybean sample was weighed in triplicate into a 3.5 × 5 mm silver capsule for the simultaneous determination of δ 2 H and δ 18 O.Soybean samples and reference materials were exposed to local atmospheric conditions to equilibrate exchangeable water in the laboratory for at least 48 h before analysis.TC/EA parameters were as follows: furnace temperature: 1350 • C, column temperature: 90 • C, carrier flow (Helium gas): 110 mL/min.
The isotope ratio is denoted in delta notation (δ) in accordance with the following formula: where i and j denote the highest and the lowest atomic mass number of element E, respectively.R P and R Ref represent the ratio of the heavier and the lighter element ( 13 C/ 12 C, 15 N/ 14 N, 34 S/ 32 S, 2 H/ 1 H, and 18 O/ 16 O) of the sample and reference materials, respectively [38].The delta values applied to the International System of Units (SI) were expressed in units of "per mil" (‰) or "miliurey" (mUr).In this study, we choose "‰" as the isotope values unit.The δ 13 C values were expressed as relative to Vienna-Pee Dee Belemnite (V-PDB), δ 15 N to the air (air) scale, δ 34 S to Vienna-Canyon Diablo Troilite (V-CDT) scale, while δ 2 H and δ 18 O were expressed relative to the Vienna-Standard Mean Ocean Water (V-SMOW) scale.The isotope values of soybean samples were calculated against international standard reference materials as follows: USGS43 (Indian human hair powder, δ 13 C V-DPB = −21.28‰,δ 15 N Air = 8.44‰, δ 34 S V-CDT = 10.46‰) was used for δ 13 C, δ 15 N and δ 34 S calibration, Sulfadiazine (C = 41.81%,N = 16.25%,S = 18.62%) was used as the quality control sample for elemental content of carbon, nitrogen, and sulfur (C %, N % and S %).USGS43 (Indian human hair powder, δ 2 H V-SMOW = −50.3‰,δ 18 O V-SMOW = +14.11‰),USGS54 (Canadian lodgepole pine wood powder, δ 2 H V-SMOW = −150.4‰,δ 18 O V-SMOW = +17.79‰),USGS56 (South African red ivory wood powder, δ 2 H V-SMOW = −44‰, δ 18 O V-SMOW = +27.23‰)were used for the δ 2 H and δ 18 O calibrations.Blanks were used, and standards were inserted for drift correction after every eight samples.The accuracy of carbon, nitrogen, and sulfur stable isotopes is ±0.1‰, ±0.2‰, and ±0.4‰, respectively.The accuracy of hydrogen and oxygen stable isotopes was ±1.2‰ and ±0.3‰, respectively.

Elemental Analysis
A total of 23 macro, micro, and trace elements (Na, Mg, Al, K, Ca, V, Cr, Mn, Fe, Co, Ni, Cu, Zn, Ga, As, Se, Rb, Sr, Cd, Cs, Ba, Tl, Pb) were quantified using Inductively Coupled Plasma Mass Spectrometry (ICP-MS, Agilent, 8800, Agilent Technologies, Tokyo, Japan).Around 0.5 g of powdered soybean was weighed and digested using a microwave digestion system.Samples were digested in Teflon digestion vessels containing a mixture of 6.0 mL of HNO 3 (AR, 65%, w/w, Merck, Darmstadt, Germany) and 2.0 mL of H 2 O 2 (AR, 20%, w/w, Merck, Darmstadt, Germany, Merck) for 2 h.The digestion temperature program was as follows: from 0 • C to 100 • C in 10 min, increasing to 180 • C in 15 min, and increasing to 240 • C in 15 min before being kept at 240 • C for 10 min.After mineralization, the obtained extracts were redissolved to a total volume of 25 mL with double Milli-Q water (MΩ > 18.2 M) and subsequently analyzed by ICP-MS.A multi-element standard solution (ICP-MS-CAL2-1, AccuStandard, Thermo Fisher, Bremen, Germany) in 0.5% nitric acid (chromatographically pure) was used to determine the sensitivity factors of all elements over the entire mass range measured by semi-quantitative mode dilution samples.

Statistical Analysis
Data processing was accomplished using SPSS 24.0 for macOS (IBM, Armonk, New York, NY, USA), and the results were expressed as the mean value ± standard deviation.GraphPad Prism 8.4.0 for macOS (GraphPad Software Inc., San Diego, CA, USA) was used to produce the figures.One-way analysis of variance (ANOVA) followed by the post hoc Tukey's HSD was used to evaluate the differences in the samples with the isotope and elemental variations from the four countries, respectively.This study considered a statistically significant difference occurred at a p-value ≤ 0.05.Principal component analysis (PCA) was used in the analysis process to reduce the dimensionality of the dataset and describe the variability of the system with fewer variables.Linear discriminant analysis (LDA) was used to find the linear combinations of noted attributes that can clearly separate two or more classes of objects [39].Backpropagation artificial neural networks (BP-ANN) can learn and store a large number of input-output pattern maps.It is a kind of mathematical equation that does not need to reveal the mapping in advance.PCA, LDA, and BP-ANN analysis were implemented in SPSS 24.0 for macOS (IBM, Armonk, New York, NY, USA).

Isotopes and Non-Metallic Elements of Soybeans from Different Geographical Origins
Stable isotope and elemental profiles of imported and Chinese domestic soybean samples were analyzed and summarized by mean value ± standard deviation in Table 2. Box-Whisker plots of stable isotopes and non-metallic element contents are shown in Figures 1 and 2. Analysis of bulk stable isotope and non-metallic elements using ANOVA showed there were significant differences (p < 0.05) among eight variables from the four origins (Brazil, China, the USA, and Argentina).The δ 2 H, δ 18   Soybean δ 13 C values were within the expected range for C 3 plants, as δ 13 C values of all soybean samples ranged from −30.0‰ to −26.0‰.Mean δ 13 C values of soybean samples from the USA (−26.2 ± 0.3‰) were slightly more positive than other origins, and δ 13 C values of soybean samples from Argentina had the most negative mean δ 13 C values (−27.6 ± 0.6‰).Some environmental factors can affect the δ 13 C values of plants, such as drought, solar radiation intensity, low temperature, low pressure, and ozone stress [40].Previous studies proved that under the influence of temperature, light intensity, and humidity, the δ 13 C values of plants in humid cultivation areas were more negative [21].Argentina and Brazil are located in South America and receive abundant rainfall, with most areas receiving more than 1000 mm of precipitation per year, making it the wettest continent in the world.Conversely, USA soybeans are grown at higher latitudes with a cooler climate, which reduces stomatal conductance during photosynthesis and discriminates against 13 C.The average of δ 15 N values of Chinese soybeans (1.6 ± 1.7‰) was consistent with pre viously reported results [36,41].The maximum mean δ 15 N value of soybeans was 2.2 0.2‰ from the USA, and the minimum mean δ 15 N value was found in Brazilian and A gentinian soybeans with an average of 0.4‰ and 0.5%.The use of nitrogen fertilizer generally not recommended for legumes because, under favorable conditions, legume are able to grow well from existing soil N plus N2 derived from symbiotic nitrogen fixatio [42].In order to minimize soybean production costs, high natural soil fertility is particu larly important to ensure good soybean quality.The δ 15 N value of soybeans from Braz and Argentina is close to that of atmospheric N2 (0‰ by definition), and it is interprete that these values result from the fractionation of the legume-Rhizobium symbiosis durin biological nitrogen fixation (BNF), which is generally small.This results in the δ 15 N valu of the legume approaching zero when there is high symbiotic dependence.[13,43].More over, the depleted nitrogen isotopic composition of soybeans in Brazil and Argentina sug gests the utilization of synthetic fertilizers in their cultivation.USA soybeans are typicall grown in deep "corn belt" soils where the soil is more fertile [44].The natural climate an environmental conditions are suitable for soybean cultivation.Most USA soybean-pro ducing areas have low and flat terrain.Fertile black grassland soil and chernozem soil ar   The average of δ 15 N values of Chinese soybeans (1.6 ± 1.7‰) was consistent with pr viously reported results [36,41].The maximum mean δ 15 N value of soybeans was 2.2 0.2‰ from the USA, and the minimum mean δ 15 N value was found in Brazilian and A gentinian soybeans with an average of 0.4‰ and 0.5%.The use of nitrogen fertilizer generally not recommended for legumes because, under favorable conditions, legume are able to grow well from existing soil N plus N2 derived from symbiotic nitrogen fixatio [42].In order to minimize soybean production costs, high natural soil fertility is particu larly important to ensure good soybean quality.The δ 15 N value of soybeans from Braz and Argentina is close to that of atmospheric N2 (0‰ by definition), and it is interprete that these values result from the fractionation of the legume-Rhizobium symbiosis durin biological nitrogen fixation (BNF), which is generally small.This results in the δ 15 N valu of the legume approaching zero when there is high symbiotic dependence.[13,43].Mor over, the depleted nitrogen isotopic composition of soybeans in Brazil and Argentina sug gests the utilization of synthetic fertilizers in their cultivation.USA soybeans are typicall grown in deep "corn belt" soils where the soil is more fertile [44].The natural climate an environmental conditions are suitable for soybean cultivation.Most USA soybean-pro ducing areas have low and flat terrain.Fertile black grassland soil and chernozem soil ar widely distributed in the USA, resulting in the highest mean δ 15 N values of soybeans o The average of δ 15 N values of Chinese soybeans (1.6 ± 1.7‰) was consistent with previously reported results [36,41].The maximum mean δ 15 N value of soybeans was 2.2 ± 0.2‰ from the USA, and the minimum mean δ 15 N value was found in Brazilian and Argentinian soybeans with an average of 0.4‰ and 0.5%.The use of nitrogen fertilizer is generally not recommended for legumes because, under favorable conditions, legumes are able to grow well from existing soil N plus N 2 derived from symbiotic nitrogen fixation [42].In order to minimize soybean production costs, high natural soil fertility is particularly important to ensure good soybean quality.The δ 15 N value of soybeans from Brazil and Argentina is close to that of atmospheric N 2 (0‰ by definition), and it is interpreted that these values result from the fractionation of the legume-Rhizobium symbiosis during biological nitrogen fixation (BNF), which is generally small.This results in the δ 15 N value of the legume approaching zero when there is high symbiotic dependence.[13,43].Moreover, the depleted nitrogen isotopic composition of soybeans in Brazil and Argentina suggests the utilization of synthetic fertilizers in their cultivation.USA soybeans are typically grown in deep "corn belt" soils where the soil is more fertile [44].The natural climate and environmental conditions are suitable for soybean cultivation.Most USA soybeanproducing areas have low and flat terrain.Fertile black grassland soil and chernozem soil are widely distributed in the USA, resulting in the highest mean δ 15 N values of soybeans of the four studied origins.δ 15 N values typically rely on agricultural practices, such as the type of fertilizer applied (organic or synthetic) or agricultural management.Soybeans cultivated in China exhibit the largest δ 15 N standard deviation, potentially due to the different cultivation methods across China.It is possible that certain Chinese soybean samples have been cultivated using organic farming instead of conventional practices.Nonetheless, this hypothesis could not be substantiated due to the unavailability of soil and fertilizer data.
δ 34 S values of plants do not clearly distinguish production methods (conventional versus organic) or fertilizer type but are generally correlated with local geological and soil conditions [13,45].Chinese and Brazilian soybeans had the highest mean δ 34 S values (6.1‰ and 5.9‰), while the USA had the lowest mean δ 34 S values (−3.0‰).The δ 34 S values of USA soybeans had the widest isotopic distribution range compared to other origins, possibly due to a wider range of different geological sulfur sources and cultivation environments found in the Midwestern states of Iowa, Illinois, Minnesota, and Indiana (order of top producing states) [46].
Soybean δ 2 H and δ 18 O values were significantly different among the four origins.Argentinean soybean had the most positive δ 2 H and δ 18 O values (−108.6‰and 24.7‰, respectively).The lowest mean soybean δ 2 H value was found in the USA (−145.8‰),and the lowest mean soybean δ 18 O value was found in China (16.7‰).The climate of Argentina's agricultural regions is more tropical (humid and warm) than other countries.In general, geographical temperature fractionation effects exist, where the concentration of 18 O decreases with increasing latitude and cooler temperatures [23].However, due to the lack of information about precise sampling locations, the effects of altitude and latitude are not clear in this study.Results suggest that Argentinian soybeans may be produced at lower altitudes or in warmer regions, closer to the coast than the other origins.Lower δ 2 H and δ 18 O values for Chinese soybeans suggest they are grown inland and potentially in cooler regions than the other countries.A comparison between δ 2 H and δ 18 O values in soybean samples from four origins is shown in Figure 3. Overall, δ 2 H and δ 18 O values of soybeans show a reasonable overall linear relationship (R 2 = 0.512), although soybean isotope values cluster into their respective origins.
, x FOR PEER REVIEW 8 of 15 cultivated in China exhibit the largest δ 15 N standard deviation, potentially due to the different cultivation methods across China.It is possible that certain Chinese soybean samples have been cultivated using organic farming instead of conventional practices.Nonetheless, this hypothesis could not be substantiated due to the unavailability of soil and fertilizer data.δ 34 S values of plants do not clearly distinguish production methods (conventional versus organic) or fertilizer type but are generally correlated with local geological and soil conditions [13,45].Chinese and Brazilian soybeans had the highest mean δ 34 S values (6.1‰ and 5.9‰), while the USA had the lowest mean δ 34 S values (−3.0‰).The δ 34 S values of USA soybeans had the widest isotopic distribution range compared to other origins, possibly due to a wider range of different geological sulfur sources and cultivation environments found in the Midwestern states of Iowa, Illinois, Minnesota, and Indiana (order of top producing states) [46].
Soybean δ 2 H and δ 18 O values were significantly different among the four origins.Argentinean soybean had the most positive δ 2 H and δ 18 O values (−108.6‰and 24.7‰, respectively).The lowest mean soybean δ 2 H value was found in the USA (−145.8‰),and the lowest mean soybean δ 18 O value was found in China (16.7‰).The climate of Argentina's agricultural regions is more tropical (humid and warm) than other countries.In general, geographical temperature fractionation effects exist, where the concentration of 18 O decreases with increasing latitude and cooler temperatures [23].However, due to the lack of information about precise sampling locations, the effects of altitude and latitude are not clear in this study.Results suggest that Argentinian soybeans may be produced at lower altitudes or in warmer regions, closer to the coast than the other origins.Lower δ 2 H and δ 18 O values for Chinese soybeans suggest they are grown inland and potentially in cooler regions than the other countries.A comparison between δ 2 H and δ 18 O values in soybean samples from four origins is shown in Figure 3. Overall, δ 2 H and δ 18 O values of soybeans show a reasonable overall linear relationship (R 2 = 0.512), although soybean isotope values cluster into their respective origins.Elemental nitrogen, carbon, and sulfur (% N, % C, and % S) of imported and domestic soybean samples ranged from 5.2% to 8.7%, 45.8% to 54.3%, and 0.1% to 1.0%, respectively.Average % N, % C, and % S contents were 6.2%, 50.4%, and 0.3%, respectively.A significant difference in N, C, and S elemental compositions of soybean from four origins was seen (p < 0.05).USA soybeans had the lowest N, C, and S element contents among the  Elemental nitrogen, carbon, and sulfur (% N, % C, and % S) of imported and domestic soybean samples ranged from 5.2% to 8.7%, 45.8% to 54.3%, and 0.1% to 1.0%, respectively.Average % N, % C, and % S contents were 6.2%, 50.4%, and 0.3%, respectively.A significant difference in N, C, and S elemental compositions of soybean from four origins was seen (p < 0.05).USA soybeans had the lowest N, C, and S element contents among the four producing countries, while Chinese soybeans had the highest elemental contents.Some soybean samples from the USA may have originated from the northern states, which have cooler temperatures and less sunshine, resulting in lower nitrogen, carbon, and sulfur storage in plant tissues.Significant elemental differences in soybeans found in the four origins indicate that these elemental contents may contribute towards distinguishing geographical origin.

Mineral Element Compositions of Soybeans from Different Geographical Origins
Mineral element abundance is shown in Table 2.The relative elemental concentrations were quite variable among soybeans from different countries but comparable with elements found in USA and Brazilian soybeans from previous studies [34].The concentration range of most mineral elements from the four origins overlapped.Statistically significant differences (p < 0.05, same below) linked to the geographical origin of the samples were found for all 23 elements except Mg, Fe, Cu, and Pb.Soybean Mg and Cu contents were found to be effective origin indicators of Chinese and Japanese soybeans [47], but these elements were not considered important origin discriminators in this study, possibly due to the different soybean origins that were chosen.The most abundant soybean elements found in this study were K, Ca, and Mg.Soybean K content ranged between 314,785.98 µg/kg and 564,631.18µg/kg, with an average value of 413,278.10 µg/kg.Histograms were made by standardizing each mineral element variable with its Z-score (Figure 4) in order to distinguish composition differences among the four production origins.
Elemental contents of Al, V, Co, Ga, Rb, Cs, and Tl were significantly higher in Brazilian soybeans than in the other three origins.Brazilian soybeans had the lowest concentrations of K, and Brazilian and Chinese soybeans had lower concentrations of AS, Se, and Cd than USA and Argentinean soybeans.Brazilian and Argentinean soybeans had lower concentrations of Ni.Chinese soybeans had significantly higher concentrations of Ni and the lowest concentration of Na.The elemental contents of Cr, Mn, Zn, and Se were significantly higher in USA soybeans than in other origins.Soybeans cultivated in Argentina had the highest concentrations of Na, As, and Sr and the lowest concentrations of Mn.Significant differences in mineral element contents are attributed to different geological and environmental conditions, climatic uptake factors, and human interventions from each origin.
Soybean production in Brazil is concentrated in the "Cerrado" region, a prairie-like plain with less fertile soils that are rich in aluminum, highly acidic, and lack phosphorus and nitrogen [44].The low soil pH found in Brazil increases the bioavailability, and consequently the plant uptake, of aluminum and iron [48] relative to other origins.

Soybean Origin Model Evaluation and Classification Using Chemometrics
Chemometrics is an interdisciplinary method that employs multivariate statistics to extract essential information from intricate chemical systems and can now discern subtle differences in measurements among target products [49].In this study, we applied unsupervised and supervised pattern recognition methods, including PCA, LDA, and BP-ANN, to analyze soybean data.
due to the different soybean origins that were chosen.The most abundant soybean elements found in this study were K, Ca, and Mg.Soybean K content ranged between 314,785.98 µg/kg and 564,631.18µg/kg, with an average value of 413,278.10 µg/kg.Histograms were made by standardizing each mineral element variable with its Z-score (Figure 4) in order to distinguish composition differences among the four production origins.2. Different lowercase letters indicate significant differences were recorded in the soybeans from the four origins (p < 0.05).

Principal Component Analysis (PCA)
Unsupervised PCA was performed using a combination of stable isotopes and nonmetallic and mineral element contents.Overall, PCA confirmed the ability of isotopes and elements to provide unsupervised origin classification and discriminate the soybeans based on origin.The first six principal components (PCs) in the PCA model accounted for 80.49% of the total variance of the original variables, and the first three PCs explained 58.05% of the total variance of the original variables.Meanwhile, a 3D-Ordinations plot, constructed using the first three PCs, showed that soybean samples from four origins were generally well clustered without overlap (Figure 5a).PCA was shown to be a useful technique to initially determine the geographical origin of soybeans, but due to the lower total variance of the variables, other models were investigated.

Linear Discriminant Analysis (LDA) and BP-Artificial Neural Network (BP-ANN)
More meaningful classification models were constructed using LDA and BP-ANN in combination with stable isotope and element data.In order to establish a stronger origin classification model, identify effective classification variables, and better classify soybean samples from different origins, supervised linear discriminant analysis (LDA) and unsupervised BP artificial neural network analysis (BP-ANN) were performed using both stable isotopes and multi-element contents.
metallic and mineral element contents.Overall, PCA confirmed the ability of isotopes and elements to provide unsupervised origin classification and discriminate the soybeans based on origin.The first six principal components (PCs) in the PCA model accounted for 80.49% of the total variance of the original variables, and the first three PCs explained 58.05% of the total variance of the original variables.Meanwhile, a 3D-Ordinations plot, constructed using the first three PCs, showed that soybean samples from four origins were generally well clustered without overlap (Figure 5a).PCA was shown to be a useful technique to initially determine the geographical origin of soybeans, but due to the lower total variance of the variables, other models were investigated.LDA analysis of soybean samples from four origins revealed discriminant accuracies of 85.3% for stable isotopes only, 75.0% for non-metallic elemental contents only, and 100.0%for mineral elemental compositions only (Table 3 A-C).The discrimination accuracy for mineral elements of soybeans was higher than stable isotopes and non-metallic elements and was able to completely separate soybeans from the four origins.However, a single USA soybean sample was mis-classified as Argentinian, resulting in an accuracy rate of 99.4% in cross-validation.LDA results show that Brazilian soybeans had the lowest discriminant accuracy when using stable isotope variables only, and most of the mis-classified Brazilian soybeans were classed as Argentinian.This higher rate of misclassification using stable isotopes may be caused by a closer geographical proximity between Argentina and Brazil than the other two origins, and these two producing origins may share similar climate and farming conditions.In order to obtain a higher discriminant rate and more accurately distinguish the four soybean-producing origins, the three groups of variables were combined into a single linear discriminant analysis model.The results showed that a combined stable isotope and multi-element approach could completely distinguish soybeans from all origins, and the original accuracy and cross-validation accuracy were both up to 100% (Table 3

D).
A BP-ANN model was built with 70% training samples and 30% testing samples.Training and testing samples from each origin were selected randomly.The normalized importance for each elemental and stable isotope variable for the contribution to origin verification is shown in Figure 6.As, Cd, Se, % S, and Fe were identified as the top five important variables contributing to origin verification, and they all showed a strong ability to identify the origin of Chinese and imported soybeans.The ANN model was able to discriminate soybeans from all four origins, correctly identifying all durian origins with up to 100% training accuracy and 100% testing accuracy (Table 3 E).Classification accuracies of the BP-ANN model were found to be similar to the LDA model where the mineral element model provided the best soybean origin, although a combined approach with all variables completely distinguished soybeans from all origins.While successfully differentiating soybeans from four origins using stable isotopes and mineral elements combined with chemometrics, it is crucial to acknowledge that bulk stable isotope values and mineral elements represent overall averages of components.In specific cases, these values may not fully capture isotopic and elemental variations among different samples.Recently, there has been increasing attention paid to Compound-Specific Stable Isotope Analysis (CSIA) due to its ability to provide unique information about food components at the molecular level, reflecting geographical, cultivation, and natural ecosystem processes [50].Our forthcoming strategy aims to enhance origin discrimination by utilizing CSIA to discern isotopic variations among specific components (e.g., amino acids, fatty acids) in soybeans from different geographical origins.

Conclusions
Stable isotope and elemental compositions were used to develop an origin identification model to determine the geographical origin of soybeans from four countries.The model could be further used to improve the origin traceability of similar leguminous food products.Chinese soybeans and imported soybeans were characterized by chemometric exploratory techniques and classification parameters such as PCA, LDA, and ANN using stable isotopes (δ 2 H, δ 18 O, δ 15 N, δ 13 C, and δ 34 S), non-metallic element contents (% N, % C, and % S) and 23 mineral elements to verify their geographical origins.PCA results showed the four groups of soybean samples were distributed independently.A combined stable isotope, non-metallic, and mineral element contents data fusion approach combined with LDA afforded a more accurate soybean origin classification (with discriminant accuracy rates of 100% for both the testing and training sets) than a single analytical approach.Likewise, BP-ANN gave a 100% accuracy rate for both testing and training sets of four origin groups, and As, Cd, Se, % S, and Fe were identified as the top five important variables.An accurate origin identification tool using isotope and elemental fingerprints combined with chemometrics could avoid any potential conflict of interest arising from incorrect origin identification among soybeans.Regulatory uptake and use of soybean geographical origin models will prevent fraud or mislabeling at a global market level and could contribute to fairer international trading.
that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

FoodsFoods 1 Figure 1 .
Figure 1.Box-Whisker plots comparing four geographical origins for (a) δ 2 H, (b) δ 18 O, (c) δ 15 N, (d δ 13 C, and (e) δ 34 S compositions.Box plots display the lower, median, and upper quartile value using vertical lines to show the range from the 10 to 90 percentiles.Outliers are depicted as dot Different lowercase letters indicate that significant differences were recorded in the soybeans from the four origins (p < 0.05).

Figure 2 .
Figure 2. Box-Whisker plots of (a) % N, (b) % C, (c) % S of four geographical origins.Box plo display the lower, median, and upper quartile values, using vertical lines to show the range from the 10 to 90 percentiles.Outliers are depicted as dots.Different lowercase letters indicate that signi icant differences were recorded in the soybeans from the four origins (p < 0.05).

Figure 1 .
Figure 1.Box-Whisker plots comparing four geographical origins for (a) δ 2 H, (b) δ 18 O, (c) δ 15 N, (d) δ 13 C, and (e) δ 34 S compositions.Box plots display the lower, median, and upper quartile values, using vertical lines to show the range from the 10 to 90 percentiles.Outliers are depicted as dots.Different lowercase letters indicate that significant differences were recorded in the soybeans from the four origins (p < 0.05).

Foods 2023 ,Figure 1 .
Figure 1.Box-Whisker plots comparing four geographical origins for (a) δ 2 H, (b) δ 18 O, (c) δ 15 N, (d δ 13 C, and (e) δ 34 S compositions.Box plots display the lower, median, and upper quartile value using vertical lines to show the range from the 10 to 90 percentiles.Outliers are depicted as dot Different lowercase letters indicate that significant differences were recorded in the soybeans fro the four origins (p < 0.05).

Figure 2 .
Figure 2. Box-Whisker plots of (a) % N, (b) % C, (c) % S of four geographical origins.Box plo display the lower, median, and upper quartile values, using vertical lines to show the range fro the 10 to 90 percentiles.Outliers are depicted as dots.Different lowercase letters indicate that signi icant differences were recorded in the soybeans from the four origins (p < 0.05).

Figure 2 .
Figure 2. Box-Whisker plots of (a) % N, (b) % C, (c) % S of four geographical origins.Box plots display the lower, median, and upper quartile values, using vertical lines to show the range from the 10 to 90 percentiles.Outliers are depicted as dots.Different lowercase letters indicate that significant differences were recorded in the soybeans from the four origins (p < 0.05).

Figure 4 .
Figure 4. Histogram plots showing differences in 23 mineral compositions across four geographical origins.The concentrate of 23 mineral elements has been standardized in order to facilitate plotting,

Figure 4 .
Figure 4. Histogram plots showing differences in 23 mineral compositions across four geographical origins.The concentrate of 23 mineral elements has been standardized in order to facilitate plotting, and the original results are shown in Table2.Different lowercase letters indicate significant differences were recorded in the soybeans from the four origins (p < 0.05).

Figure 5 .
Figure 5. 3D-Ordinations of multivariate statistical analyses using a combined approach of stable isotopes, three non-metallic element contents, and 23 element compositions for four origins showing (a) principal component analysis (PCA) and (b) linear discriminant analysis (LDA).Each symbol type represents an individual origin.

15 Figure 6 .Table 3 .Figure 6 .
Figure 6.Stable isotopes and elemental composition variable importance for the contribution to wards origin verification using an ANN classification method.
used neodymium isotope ratio ( 143 Nd/ 144 Nd) to track the origin of Ruditapes philippinarum (clam) shells, and the results showed shell Foods 2023, 12, x FOR PEER REVIEW 3 of 15 such as season, local climate, and geographical factors (e.g., latitude, altitude, and distance from the sea) [18].Previous multi-element analysis established that the mineral element compositions of organisms are directly affected by regional climatic conditions, biological and environmental interactions, and metabolism [19].Soil conditions (e.g., soil type, mineral content, pH, porosity, and humus content) are the main factors affecting the mineral content of plants [20].ICP-MS (Inductively Coupled Plasma Mass Spectrometry) has the advantages of high sensitivity and short-time detection and is suitable for the simultaneous analysis of multiple mineral elements [19].
1Data from unpublished China customs yearbooks.

Table 2 .
One-way analysis of variance (ANOVA) of soybeans from four geographical origins using stable isotope and elemental contents.Element contents from ICP-MS are given in µg/kg (where 1 µg/kg = 1 ppb).The values in the table are expressed as mean ± standard deviation.
a-d Different superscript letter values indicate a significant difference among groups (p < 0.05).

Table 3 .
Variability in discriminatory accuracy for soybean origin classification based on stable isotope and elemental compositions using LDA and BP-ANN models with diverse parameters.