Effect of Different Accumulative Temperate Zones in Heilongjiang on Glycine Soja Metabolites as Analyzed by Non-Target Metabolomics

To study the effect of growth temperature on the nutritional components and metabolites of the wild soybean (Glycine soja), we analyzed the nutritional components and metabolic gases of the wild soybean in six accumulated temperature regions of the Heilongjiang Province, China, by gas chromatography–time-of-flight mass spectrometry (GC-TOF-MS). A total of 430 metabolites, including organic acids, organic oxides, and lipids, were identified and analyzed using multivariate statistical analysis, orthogonal partial least squares discriminant analysis, principal component analysis, and cluster analysis. Eighty-seven metabolites significantly differed in the sixth accumulated temperature region compared with the other five accumulated temperature regions. The 40 metabolites (such as threonine (Thr) and lysine (Lys)) were found to be elevated in soybeans from the sixth accumulated temperature zone compared with the other five accumulated temperature zones. Through analyzing the metabolic pathways of these metabolites, amino acid metabolism had the greatest influence on wild soybean quality. The results of the amino acid analysis were consistent with those of the GC-TOF-MS and showed that amino acids in wild soybeans from the sixth accumulated temperature zone significantly differed from those of the other zones. Threonine and lysine were the main substances driving these differences. The growth temperature affected the type and concentrations of metabolites in wild soybeans, and the GC-TOF-MS analysis of the effect of growth temperature on wild soybean metabolites was shown to be feasible.


Introduction
Wild soybean (Glycine soja Sieb. et Zucc.), also known as the horse seed bean or black bean, is the ancestor of the cultivated soybean Glycine max [1]. Wild soybeans have desirable properties, such as being light-loving, humidity-tolerant, shade-tolerant, drought-resistant, disease-resistant, and barren-resistant [2]. They grow especially well on saline-alkali soils with a soil pH of 9.18-9.23 and can survive winter at −41 • C [3]. With their strong salt-alkaline and cold resistance characteristics, wild soybeans have unique genetic advantages and are important germplasm resources for improving the traits of cultivated soybeans and simultaneously have ecological significance for maintaining species diversity [4]. Wild sojas are distributed across all of China except the Xinjiang, Qinghai, and Hainan Provinces [5]. As the Heilongjiang Province is located in the high latitude region of China, its climate conditions are complex and temperatures vary greatly [6]. However, the soil is fertile and pollution-free and the concentrations of trace elements are the best in the country. As a result, the quality of wild soybean resources that grow naturally in this black soil is higher.
Cultivated soybean is an important dual-use crop, which is an important source of plant protein and plant oil, and the northeast region is the main production area of

The Main Nutritional Components
The nutritional contents of wild soybeans from different accumulated temperature zones are shown in Table 1. The crude protein (CP) contents increased gradually with the increase in the accumulated temperature zone except in the fourth accumulated temperature zone, which may be due to the influence of different accumulated temperatures on the CP content in the crop growth period. The amounts of crude fat (EE) content and CP content of the wild soybeans were contrasting, with the EE content decreasing as the accumulated temperature zone; this result was the same as that of cultivated soybeans. The crude fiber, crude ash (CA), and nitrogen-free extract (NFE) of the wild soybean protein were not affected by the accumulated temperature zone. Table 2 summarizes the types and concentrations of amino acids in wild soybeans from different accumulated temperature zones. As shown in Table 2, the amino acid contents of the wild soybeans from each accumulated temperature zone were different. The amino acid contents of wild soybeans from the sixth accumulated temperature zone was notably higher than those from the other accumulated temperature zones. This result may be because the average temperature in the sixth accumulated temperature zone is lower and has a dark brown soil, and as wild soybeans have strong cold resistance and can thrive in black and newly accumulated soils, they are able to develop excellent characteristics in this zone. This result is consistent with the highest protein contents occurring in wild soybeans from the sixth accumulated temperature (Table 1).   0.368 ± 0.14 a 0.258 ± 0.06 b 0.231 ± 0.05 b 0.206 ± 0.02 a 0.177 ± 0.06 a 0.120 ± 0.04 b 0.119 ± 0.11 b 0.059 ± 0.09 b 0.036 ± 0.03 a Sixth accumulated temperature zone 0.389 ± 0.15 a 0.285 ± 0.17 a 0.253 ± 0.07 a 0.235 ± 0.03 a 0.165 ± 0.07 b 0.122 ± 0.02 b 0.168 ± 0.17 a 0.047 ± 0.08 b 0.018 ± 0.01 b Different lowercase letters indicate a significant difference between treatments at the p < 0.05 level.

The Amino Acid Composition
The amino acid analysis of the wild soy protein isolates also showed that the cysteine (Cys) and methionine (Met) contents were low, which is mainly due to the loss of these two amino acids caused by acid hydrolysis in the amino acid determination. All nine essential amino acids were much higher than the recommended values from the FAO, which indicates that wild soybean protein isolates are plant proteins with rich nutritional and development value.

Gas Chromatography-Time of Flight Mass Spectrometry
Using the ChromaTOF software and using the LECO-Fiehn Rtx5 database, 430 metabolites were analyzed. The TIC stack diagram of all quality control (QC) samples are shown in Figure 1, which suggests that the peak area and retention time of the TIC of the QC samples overlap well, indicating that the results of the instrument were stable and the analysis by the instrument was reliable. The results showed that 255 metabolites were accurately identified in the sixth accumulated temperature zone, whereas 397 metabolites were annotated in the other five accumulated temperature zones. These metabolites were mainly amino acids, organic acids, organic oxides, lipids, and lipid-like molecules.
were accurately identified in the sixth accumulated temperature zone, whereas 397 metabolites were annotated in the other five accumulated temperature zones. These metabolites were mainly amino acids, organic acids, organic oxides, lipids, and lipid-like molecules.

OPLS-DA Result
The results of the principal component analysis showed that no significant difference was present in metabolism between samples from the sixth accumulated temperature zone and those from the other accumulated temperature zones. Through OPLS-DA analysis, orthogonal variables that were not related to categorical variables in the metabolites were filtered out, and non-orthogonal and orthogonal variables could be analyzed separately, thus resulting in the highly reliable intergroup differences of the metabolites and the correlation degree information of the experimental groups. Figure 2 shows the OPLS-DA score diagram of the sixth accumulated temperature zone and the other five accumulated temperature zones. All samples were within the confidence intervals. The difference between the two groups was evident ( Figure 2). No overlap was present between the samples and the separation was high. The clustering of samples from the first to fifth accumulative temperature zones was strong, which may be because wild soybeans from these different accumulative temperature zones were affected by different growth temperatures, but most of their metabolites were the same and the metabolite composition of these samples was similar, resulting in the sample points being close or clustered.

OPLS-DA Result
The results of the principal component analysis showed that no significant difference was present in metabolism between samples from the sixth accumulated temperature zone and those from the other accumulated temperature zones. Through OPLS-DA analysis, orthogonal variables that were not related to categorical variables in the metabolites were filtered out, and non-orthogonal and orthogonal variables could be analyzed separately, thus resulting in the highly reliable intergroup differences of the metabolites and the correlation degree information of the experimental groups. Figure 2 shows the OPLS-DA score diagram of the sixth accumulated temperature zone and the other five accumulated temperature zones. All samples were within the confidence intervals. The difference between the two groups was evident ( Figure 2). No overlap was present between the samples and the separation was high. The clustering of samples from the first to fifth accumulative temperature zones was strong, which may be because wild soybeans from these different accumulative temperature zones were affected by different growth temperatures, but most of their metabolites were the same and the metabolite composition of these samples was similar, resulting in the sample points being close or clustered. Figure 3 shows the permutation test results of the OPLS-DA model. Each group of models had two principal components. The cumulative values were R 2 X = 0.213, R 2 Y = 0.983, and Q 2 = 0.543. Model R 2 Y was very close to 1, which demonstrated that the established model conformed to the real situation of the sample data. The original model can better explain the difference between the two groups of samples. Simultaneously, with the decrease in the permutation retention, the proportion of the permutation Y variable increased, whereas the Q 2 of the stochastic model decreased gradually. This result showed that the original model had good robustness and no over-fitting phenomena were present.

Mining and Identification of Differential Metabolites
Differential metabolites were identified if the variable projection importance of principal component 1 of the OPLS-DA model was greater than 1 and the p value of the t test was ≤0.05. As shown in Table 3, 87 differential metabolites were screened out after comparing the sixth accumulated temperature zone with the other five accumulated temperature zones, which mainly included amino acids, sugars, fatty acids, organic acids, polyols, and other secondary metabolites. Amino acids were the main factors affecting differences in the nutritional composition of wild soybeans from different zones. Among the 87 differential metabolites, 40 metabolites (such as threonine (Thr) and lysine (Lys)) were found to be elevated in soybeans from the sixth accumulated temperature zone compared with the other five accumulated temperature zones. The results showed that wild soybeans from the sixth accumulative temperature zone differed greatly in their amino acid composition and content compared with wild soybeans from the other five accumulative temperature zones. The concentrations of the other 47 metabolites in wild soybeans from the sixth accumulated temperature zone were lower than those in the other five accumulated temperature zones. Additionally, the contents of the differential metabolites varied greatly between different samples, which shows that the metabolism of the wild soybean was highly variable. The metabolites were significantly affected by growth temperature, indicating that the wild soybean metabolites carried information specific to their accumulated temperature zone and that the differential metabolites can be used as the basis for distinguishing between accumulated temperature zones.  Figure 3 shows the permutation test results of the OPLS-DA model. Each group of models had two principal components. The cumulative values were R 2 X = 0.213, R 2 Y = 0.983, and Q 2 = 0.543. Model R 2 Y was very close to 1, which demonstrated that the established model conformed to the real situation of the sample data. The original model can better explain the difference between the two groups of samples. Simultaneously, with the decrease in the permutation retention, the proportion of the permutation Y variable increased, whereas the Q2 of the stochastic model decreased gradually. This result showed that the original model had good robustness and no over-fitting phenomena were present.

Mining and Identification of Differential Metabolites
Differential metabolites were identified if the variable projection importance of principal component 1 of the OPLS-DA model was greater than 1 and the p value of the t    Figure 4 shows the hierarchical cluster thermogram of the total metabolites in the six accumulated temperature zones. The highly expressed substances were amino acids, such as alanine, aspartic acid, and glycine, which is consistent with the results in Table 3. The left side of the figure represents the sixth accumulated temperature zone and the right side represents the other five accumulated temperature zones. The sixth accumulated temperature zone was a single sample with a significant clustering effect, whereas the other five accumulated temperature zones, as a whole, displayed no significant clustering effect, which demonstrates that the metabolites of wild soybeans from different accumulated temperature zones contain information about their accumulated temperature zone. Fur-thermore, distinguishing the accumulated temperature zones of wild soybeans is feasible using their metabolite differences.
such as alanine, aspartic acid, and glycine, which is consistent with the results in Table 3. The left side of the figure represents the sixth accumulated temperature zone and the right side represents the other five accumulated temperature zones. The sixth accumulated temperature zone was a single sample with a significant clustering effect, whereas the other five accumulated temperature zones, as a whole, displayed no significant clustering effect, which demonstrates that the metabolites of wild soybeans from different accumulated temperature zones contain information about their accumulated temperature zone. Furthermore, distinguishing the accumulated temperature zones of wild soybeans is feasible using their metabolite differences.

Metabolite Pathway
The MetPA database was used to analyze the metabolic pathways of the differential metabolites between groups under the conditions of p value = 1 after false positive correction. Through metabolic pathway concentration and topological analysis, the metabolic pathways that may be biodisturbed were identified. Figure 5 shows the metabolic pathway influencing factor map. Each point represents a total of 20 points in a metabolic pathway map. These 20 metabolic pathways can be found in the KEGG database, which includes 10 amino acid metabolic pathways. Both synthetic pathways and decomposition pathways are numbered 1-10 in Table 4. A total of 49 differential metabolites involved in these metabolic pathways are shown in Table 5.

Metabolite Pathway
The MetPA database was used to analyze the metabolic pathways of the differential metabolites between groups under the conditions of p value = 1 after false positive correction. Through metabolic pathway concentration and topological analysis, the metabolic pathways that may be biodisturbed were identified. Figure 5 shows the metabolic pathway influencing factor map. Each point represents a total of 20 points in a metabolic pathway map. These 20 metabolic pathways can be found in the KEGG database, which includes 10 amino acid metabolic pathways. Both synthetic pathways and decomposition pathways are numbered 1-10 in Table 4. A total of 49 differential metabolites involved in these metabolic pathways are shown in Table 5. Table 4. List of factors affecting metabolic pathways.

Serial Number Pathway
1 Glycine, serine, and threonine metabolism 2 Lysine biosynthesis 3 beta-Alanine metabolism 4 Alanine, aspartate, and glutamate metabolism 5 alpha-Linolenic acid metabolism 6 Valine, leucine, and isoleucine biosynthesis 7 Cysteine and methionine metabolism 8 Valine, leucine, and isoleucine degradation 9 Arginine and proline metabolism 10 Cyanoamino acid metabolism 11 Aminoacyl-tRNA biosynthesis 12 Methane metabolism 13 Vitamin B6 metabolism 14 Nicotinate and nicotinamide metabolism 15 Nitrogen metabolism 16 Pentose phosphate pathway 17 Carbon fixation in photosynthetic organisms 18 Glutathione metabolism 19 Biosynthesis of unsaturated fatty acids 20 Glucosinolate biosynthesis Table 5 shows that these substances are the hubs connecting each pathway at the nodes of the metabolic pathway network. Amino acid metabolic pathways comprise half of all the retrieved metabolic pathways. Table 5. Details of the differential metabolites and metabolic pathways.

Number of Differential
Name of Differential Metabolites Figure 5. Factors affecting metabolic pathways. Table 5. Details of the differential metabolites and metabolic pathways.

Discussion
The average temperature is the main climatic factor affecting the phenological period of most soybeans [13]. The differences between the accumulated zones led to different qualities of wild soybean. This study confirmed that there are significant differences in metabolites of the same variety of wild soybean growing in different accumulated zones. The analysis results showed that among the 87 different metabolites identified in this study, 40 metabolites, such as Thr and Lys, were higher in soybeans from the sixth accumulated temperature zone than in the other five accumulated temperature zones. Threonine and Lys are essential amino acids for the human body. Therefore, the amino acid composition and the contents of wild soybeans in the sixth accumulated temperature zone are different from those in the other five accumulated temperature zones.
Amino acids are the basic components of proteins, and these substances are at the nodes of metabolic pathways and are the hubs connecting these pathways. Half of all the metabolic pathways identified in this study involved amino acids. The nutritional value of wild soybeans is mainly reflected in their amino acid contents, and wild soybeans are rich in nine essential amino acids required by the human body: leucine (Leu), phenylalanine (Phe), valine (Val), Thr, isoleucine, tyrosine, Lys, cysteine (Cys), and methionine (Met), which are much higher than the recommended values from the FAO. Among these, Leu had the highest concentration. Leucine can enhance immunity, activate human cells, and resist harmful bacteria and microorganisms. Isoleucine, Leu, Thr, Phe, Lys, Val, and Met are essential amino acids that are also important contributors to the nutritional value of wild soybeans.
It was found that amino acid was the main metabolic pathway of wild soybean in different accumulated temperate zones in Heilongjiang Province. Lin H et al. also identified 169 differential metabolites from soybeans, which were mainly involved in key pathways, such as amino acid biosynthesis and catabolism, lipid oxidation, and secondary metabolite accumulation [14]; amino acid was the main pathway; and environmental factors had a great impact on the metabolism of wild soybean. Some studies have found that amino acid metabolites, such as proline, is also able to act as osmotic factors in plants to improve survival when exposed to stressors including drought and salt stress [15,16]. The differences of temperature, environment, and other factors in different accumulated temperate zones in Heilongjiang Province lead to significant differences in their metabolites. In addition, amino acid metabolism and lipid metabolism were reported to play key roles in the drought resistance of different soybean varieties [17]. In summary, the metabolites of different accumulated temperate zones in Heilongjiang Province mainly concentrated on amino acids. Amino acids are not only important nutritionally but are also regulatory factors involved in adapting to local environmental conditions.

Materials and Chemicals
For this study, wild soybeans were collected in 2019 from the first accumulated temperature zone (Daqing city), the second accumulated temperature zone (Jiamusi city), the third accumulated temperature zone (Muling city), the fourth accumulated temperature zone (Yichun city), the fifth accumulated temperature zone (Heihe city), and the sixth accumulated temperature zone (Greater Higgnan Mountains.). These six accumulated temperature zones are beneficial to northeast China and were identified by Professor Mu Liqiang of the Northeast Forestry University. The plants with good growth and similar height were selected, and some mature seeds of good appearance and with no infections, diseases, or pests were cut and stored in air-permeable bags. Three samples were collected from each accumulated temperature zone, totaling 18 samples.
Chromatographic grade methanol was purchased from the CNW Technology Company of Germany (Germany). 3

A Breakdown of the Division of the Heilongjiang Temperate Zone
A breakdown of the division of the Heilongjiang temperate zone are shown in Table 6 [18]. Method for calculation of nitrogen-free extract [19]: Nitrogen free extract (%) = Dry matter% − (Crude protein% + Crude fat% + Crude fiber% + Crude ash%)

Analysis of Amino Acid Composition of Wild Soybean Protein Isolates
The amino acid determination method was as follows: a certain mass of the protein sample was added to 6 mol/L of hydrochloric acid, hydrolyzed at 110 • C for 24 h, then filtered and evaporated. The residue was dissolved in citric acid buffer solution with a pH 2.2, and Hitachi HITACHI835 -50 automatic amino acid analyzer was used for analysis.

Metabolite Extraction
The collected wild soybean samples with pods were air dried until the seeds cracked, after which the seeds were threshed naturally. After preliminary hulling, wild soybeans with full grains and no pests were selected and stored at −80 • C.
Derivation processing: 20 ± 1 mg of the sample was placed in a 2 mL EP tube, then 500 µL of pre-cooled extraction solution (methanol: chloroform volume ratio = 3:1) was added. Next, 10 µL of ribitol was added and was vortexed for 30 s. Porcelain beads were added and the samples was processed in a 45 Hz grinder for 4 min and the ultrasonic for 5 min (in an ice water bath). The samples were centrifuged at 4 • C and 12,000 rpm for 15 min. Subsequently, 100 µL of the supernatant was carefully pipetted into a 1.5 mL EP tube, while 40 µL of each sample was mixed to form a QC sample. The metabolites was dried in a vacuum concentrator. After evaporation in a vacuum concentrator, 50 µL of methoxyamination hydrochloride (20 mg/mL in pyridine) was added and then incubated at 80 • C for 30 min, then derivatized by 70 µL of BSTFA regent (1% TMCS, v/v) at 70 • C for 1.5 h. Gradually cooling samples to room temperature, 5 µL of FAMEs (in chloroform) was added to the QC sample. All samples were then analyzed by gas chromatograph coupled with a time-of-flight mass spectrometer (GC-TOF-MS) [20].

Statistical Analysis
Raw data analysis, including peak extraction, baseline adjustment, deconvolution, alignment, and integration [21], was finished with Chroma TOF (V 4.3x, LECO) software and LECO-Fiehn Rtx5 database was used for metabolite identification by matching the